WAN VACE In ComfyUI - AI Video Generation With Full Control

Table of Contents
1. Introduction
In the ever-evolving landscape of AI and machine learning, ComfyUI has emerged as a powerful tool for creators and developers alike. One of its standout features is the native support for WAN VACE , which caters to a wide range of hardware capabilities. Whether you're working with a low VRAM card or a more robust setup, ComfyUI has you covered with its 1.3B models for low VRAM cards and 14B models for those with more memory to spare. This flexibility allows users to harness the power of advanced AI models without being constrained by their hardware limitations. Additionally, the inclusion of GGUF models enhances the versatility of the platform, making it an attractive option for various applications. In this blog post, we will explore the new WAN VACE templates available in ComfyUI, how to utilize them effectively, and the exciting possibilities they offer for video creation.
If you have not set up ComfyUI yet check out our detailed tutorial here:
2. Exploring WAN VACE Templates
To access the new WAN VACE templates in ComfyUI, navigate to the workflow menu and click on browse templates. Here, you will find five new templates displayed at the top: text to video, reference (image) to video, control video, outpainting, and first frame last frame. If you don’t see these templates, it’s likely that you are using an outdated version of ComfyUI. To check your version, head over to the ComfyUI Manager. If your version is older, simply click the update ComfyUI button to ensure you have the latest features at your disposal. Once updated, you can dive into the exciting world of video creation with these templates. Let's take a closer look at a few of these templates, starting with the text to video option, which allows you to generate videos based on textual prompts.
3. Text to Video
The text to video template is pretty straight forward and nothing new, but when loading it up for the first time you are prompted to download the missing models.
For those with lower VRAM cards, the 1.3B model is recommended, allowing for efficient processing without the need for the larger 32GB download. Conversely, users with at least 24GB of VRAM can opt for the 14B model, which, despite its size, runs smoothly on capable hardware. You can also skip these models and download a GGUF model instead which we will take a closer look on later. In addition you need to download the VAE, one of the text encoders and I highly recommend downloading the CausVid LoRa to speed up the generation process, download the correct CauseVid LoRa that matches your chosen model. The process begins by entering a text prompt that describes the video you wish to create. For instance, if you want to create a animation of a sports car, you can craft a prompt that reflects that theme. The WAN VACE to video node will then handle the technical aspects, such as width, height, and frame count, ensuring that your video meets your specifications. Below is a comparison of the two models and their respective features:
Model Size | VRAM Requirement | Recommended Use |
---|---|---|
GGUF | Low-Mid VRAM cards | Fast & Medium-quality video generation |
1.3B | Low VRAM cards | Fast & Basic video generation |
14B | ~24GB or more | Slow & High-quality video generation |
If you receive an error with your first launch, please check the FAQ below to resolve this.
4. Image to Video
The image to video template offers a unique twist by allowing users to provide a reference image alongside their video creation. This feature is particularly useful for those looking to maintain a specific aesthetic or style in their videos. However, it’s important to note that the WAN VACE currently does not utilize style references for training, so the input image should ideally have a solid background. By integrating a remove background node, you can enhance the reference images for better results.
5. Control Video
The control video template, on the other hand, allows for more advanced manipulation by accepting both an input image and an input video. This dual functionality enables users to crop, resize, and preprocess videos effectively, resulting in a polished final product. The flexibility of these templates opens up a world of creative possibilities, allowing users to experiment with different styles and formats. For example extract poses from a video and use those poses to create a different style with the same animations.
And if you want the video to follow the original even more you can simply replace the default Canny preprocessor with a different preprocessor like Depth or OpenPose like in the example above.
6. GGUF Models & Faster Generation Tips
WAN VACE comes with a bunch of GGUF models to choose from.
To find out which of these models you should choose is looking at their size and compare them to your dedicated VRAM.
For example: if you have a RTX 4070 with 12GB of VRAM. Than you should choose the Q4_K_M model which is 11.6GB in size. To load this model in your ComfyUI workflow you need to use the Unet Loader (GGUF) node. Using the GGUF models will greatly increase your generation speed with a minimal quality loss.
Another great tip to speed up your video generation is using the ddim sampler with the ddim_uniform scheduler.
With all these enabled your workflow might look something like this.
7. WAN VACE Examples
Let's look at some examples that we created with WAN VACE.
WAN VACE Text To Video
Prompt used: An anime illustration of a blonde girl with long hair she is wearing a bikini, standing by a pool. She takes a zip of her drink with a straw.
Steps: 6
Sampler: uni_pc
Scheduler: simple
Dimensions: 768x521
Model: ...14B-Q4_K_M.gguf
WAN VACE Reference To Image
The original image was 512 x 512 and rendered locally in about 2 minutes on a 12GB VRAM GPU.
Reference Image used:
WAN VACE Control Video
We used the same reference image for this workflow, and for the reference video I input a video of a man boxing. For the preprocessor we used OpenPose. As you can see the video stays very true to the reference images and follow the OpenPose preprocessor precisely.
Note that all examples ran locally on a RTX 4070 with 12GB of VRAM. We used the GGUF models to speed up generation. But if you want to get the most out of WAN VACE we highly recommend you try ComfyUI on a 24GB VRAM card in your browser.
8. Conclusion: WAN VACE in ComfyUI
In conclusion, the introduction of WAN VACE templates in ComfyUI marks a significant advancement in the realm of AI-driven video creation. With options like text to video, image to video, and control video, users can explore a variety of creative avenues while accommodating their hardware capabilities. The ability to choose between 1.3B and 14B models ensures that both low and high VRAM users can participate in this innovative space. As AI technology continues to evolve, tools like ComfyUI empower creators to push the boundaries of their imagination, transforming simple prompts into engaging visual narratives. Whether you’re a seasoned developer or a curious newcomer, the possibilities are endless with ComfyUI’s native support for WAN VACE . So, dive in, experiment, and let your creativity flourish!