WAN VACE In ComfyUI - AI Video Generation With Full Control

June 6, 2025
ComfyUI
WAN VACE In ComfyUI - AI Video Generation With Full Control
Discover how ComfyUI's WAN VACE supports GGUF, 1.3B and 14B models for video creation. Learn about templates, features, full control and tips for optimal performance.

1. Introduction

In the ever-evolving landscape of AI and machine learning, ComfyUI has emerged as a powerful tool for creators and developers alike. One of its standout features is the native support for WAN VACE , which caters to a wide range of hardware capabilities. Whether you're working with a low VRAM card or a more robust setup, ComfyUI has you covered with its 1.3B models for low VRAM cards and 14B models for those with more memory to spare. This flexibility allows users to harness the power of advanced AI models without being constrained by their hardware limitations. Additionally, the inclusion of GGUF models enhances the versatility of the platform, making it an attractive option for various applications. In this blog post, we will explore the new WAN VACE templates available in ComfyUI, how to utilize them effectively, and the exciting possibilities they offer for video creation.

If you have not set up ComfyUI yet check out our detailed tutorial here:

How to Install ComfyUI Locally on Windows

2. Exploring WAN VACE Templates

To access the new WAN VACE templates in ComfyUI, navigate to the workflow menu and click on browse templates. Here, you will find five new templates displayed at the top: text to video, reference (image) to video, control video, outpainting, and first frame last frame. If you don’t see these templates, it’s likely that you are using an outdated version of ComfyUI. To check your version, head over to the ComfyUI Manager. If your version is older, simply click the update ComfyUI button to ensure you have the latest features at your disposal. Once updated, you can dive into the exciting world of video creation with these templates. Let's take a closer look at a few of these templates, starting with the text to video option, which allows you to generate videos based on textual prompts.

3. Text to Video

The text to video template is pretty straight forward and nothing new, but when loading it up for the first time you are prompted to download the missing models.

For those with lower VRAM cards, the 1.3B model is recommended, allowing for efficient processing without the need for the larger 32GB download. Conversely, users with at least 24GB of VRAM can opt for the 14B model, which, despite its size, runs smoothly on capable hardware. You can also skip these models and download a GGUF model instead which we will take a closer look on later. In addition you need to download the VAE, one of the text encoders and I highly recommend downloading the CausVid LoRa to speed up the generation process, download the correct CauseVid LoRa that matches your chosen model. The process begins by entering a text prompt that describes the video you wish to create. For instance, if you want to create a animation of a sports car, you can craft a prompt that reflects that theme. The WAN VACE to video node will then handle the technical aspects, such as width, height, and frame count, ensuring that your video meets your specifications. Below is a comparison of the two models and their respective features:

Model SizeVRAM RequirementRecommended Use
GGUFLow-Mid VRAM cardsFast & Medium-quality video generation
1.3BLow VRAM cardsFast & Basic video generation
14B~24GB or moreSlow & High-quality video generation

If you receive an error with your first launch, please check the FAQ below to resolve this.

4. Image to Video

The image to video template offers a unique twist by allowing users to provide a reference image alongside their video creation. This feature is particularly useful for those looking to maintain a specific aesthetic or style in their videos. However, it’s important to note that the WAN VACE currently does not utilize style references for training, so the input image should ideally have a solid background. By integrating a remove background node, you can enhance the reference images for better results.

5. Control Video

The control video template, on the other hand, allows for more advanced manipulation by accepting both an input image and an input video. This dual functionality enables users to crop, resize, and preprocess videos effectively, resulting in a polished final product. The flexibility of these templates opens up a world of creative possibilities, allowing users to experiment with different styles and formats. For example extract poses from a video and use those poses to create a different style with the same animations.

And if you want the video to follow the original even more you can simply replace the default Canny preprocessor with a different preprocessor like Depth or OpenPose like in the example above.

6. GGUF Models & Faster Generation Tips

WAN VACE comes with a bunch of GGUF models to choose from.

To find out which of these models you should choose is looking at their size and compare them to your dedicated VRAM.

WAN VACE GGUF Models

For example: if you have a RTX 4070 with 12GB of VRAM. Than you should choose the Q4_K_M model which is 11.6GB in size. To load this model in your ComfyUI workflow you need to use the Unet Loader (GGUF) node. Using the GGUF models will greatly increase your generation speed with a minimal quality loss.

Another great tip to speed up your video generation is using the ddim sampler with the ddim_uniform scheduler.

With all these enabled your workflow might look something like this.

7. WAN VACE Examples

Let's look at some examples that we created with WAN VACE.

WAN VACE Text To Video

Prompt used: An anime illustration of a blonde girl with long hair she is wearing a bikini, standing by a pool. She takes a zip of her drink with a straw.

Steps: 6

Sampler: uni_pc

Scheduler: simple

Dimensions: 768x521

Model: ...14B-Q4_K_M.gguf

WAN VACE Reference To Image

The original image was 512 x 512 and rendered locally in about 2 minutes on a 12GB VRAM GPU.

Reference Image used:

WAN VACE Control Video

We used the same reference image for this workflow, and for the reference video I input a video of a man boxing. For the preprocessor we used OpenPose. As you can see the video stays very true to the reference images and follow the OpenPose preprocessor precisely.

Note that all examples ran locally on a RTX 4070 with 12GB of VRAM. We used the GGUF models to speed up generation. But if you want to get the most out of WAN VACE we highly recommend you try ComfyUI on a 24GB VRAM card in your browser.

8. Conclusion: WAN VACE in ComfyUI

In conclusion, the introduction of WAN VACE templates in ComfyUI marks a significant advancement in the realm of AI-driven video creation. With options like text to video, image to video, and control video, users can explore a variety of creative avenues while accommodating their hardware capabilities. The ability to choose between 1.3B and 14B models ensures that both low and high VRAM users can participate in this innovative space. As AI technology continues to evolve, tools like ComfyUI empower creators to push the boundaries of their imagination, transforming simple prompts into engaging visual narratives. Whether you’re a seasoned developer or a curious newcomer, the possibilities are endless with ComfyUI’s native support for WAN VACE . So, dive in, experiment, and let your creativity flourish!

Frequently Asked Questions

AI Video Generation

Create Amazing AI Videos

Generate stunning videos with our powerful AI video generation tool.

Get Started Now
OR