NEW Wan 2.2 Video Generation Model in ComfyUI (I2V 14B FP8 Version)

Table of Contents
1. Introduction
Welcome to the tutorial for the new Wan2.2 Video Generation Model in ComfyUI (FP8 Version). This powerful model uses the latest AI advancements to convert images into high-quality, seamless videos with impressive detail and efficiency.
In this guide, we’ll walk you through everything you need to get started—from setting up the Wan2.2 model in ComfyUI and loading the workflow, to adjusting important settings for optimal results. We’ll also explore practical examples to help you understand how to make the most of this tool.
By the end of this tutorial, you’ll have a clear, hands-on understanding of how to generate stunning videos from images using Wan2.2. Let’s dive in and begin this exciting journey into video generation!
2. Requirements for Wan 2.2 Video Generation Model (I2V FP8 Model)
Before we can start using the Wan2.2 Video Generation Model, it’s important to ensure all the necessary requirements are met. Since the FP8 version of Wan2.2 is resource-intensive and requires a high-end GPU, we recommend running ComfyUI on RunPod, which provides the necessary hardware and makes managing large models easier.
The guide below will walk you through setting up ComfyUI on RunPod, including how to configure a network volume for storing your files.
If you don’t have access to a local machine with a high-end GPU, be sure to check out the article above for smooth experience.
Download Wan 2.2 Model Files
Next, we need to gather the models required for the Wan2.2 setup. You can find the necessary files on the Hugging Face page dedicated to the Wan2.2 model. Here’s a quick overview of the files you’ll need:
File Name | Hugging Face Download Page | File Directory |
---|---|---|
wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors | Download | ..\ComfyUI\models\diffusion_models |
wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors | Download | ..\ComfyUI\models\diffusion_models |
umt5_xxl_fp8_e4m3fn_scaled.safetensors | Download | ..\ComfyUI\models\clip |
wan_2.1_vae.safetensors | Download | ..\ComfyUI\models\vae |
Verify Folder Structure
Before we proceed, let’s verify that all Wan2.2 model files are correctly placed in their respective ComfyUI folders. Proper organization ensures the workflow runs smoothly.
ts1📁 ComfyUI/ 2└── 📁 models/ 3 ├── 📁 clip/ 4 │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors 5 ├── 📁 diffusion_models/ 6 │ ├── wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors 7 │ └── wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors 8 └── 📁 vae/ 9 └── wan_2.1_vae.safetensors
Now that all files are downloaded and properly organized, let’s download and load the Wan2.2 workflow to start generating videos from images!
3. Downloading and Loading the Wan2.2 Workflow for ComfyUI (I2V FP8)
Now that all the required files are in place, it’s time to load the Wan2.2 workflow into ComfyUI. This is a key step that prepares everything for generating videos from your images.
Start by downloading the workflow file—it contains all the configurations the Wan2.2 model needs to run properly. You can grab it from the link below:
👉 Download Wan2.2 I2V FP8 Workflow JSON
Before we move on, let’s take a look at the Wan2.2 workflow in ComfyUI. Here’s an overview of the interface and how the different components connect to generate videos from images. This visual guide will help you understand the process and navigate the workflow with ease.
In the next section, we’ll dive into the key settings to optimize your video generation.
4. Wan2.2 Video Generation Settings (I2V)
With the Wan2.2 workflow successfully loaded into ComfyUI, the next step is to configure the settings to optimize video generation.
Proper configuration is essential to ensure that the model performs at its best and produces high-quality outputs.
Step 1: Load Models
-
Load Base Models
Use the Load Diffusion Model node twice:-
High-noise model: wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors
-
Low-noise model: wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors
-
weight_dtype: fp8_e4m3fn_fast
-
-
Load CLIP Text Encoder
Use the Load CLIP Text Encoder node:- Model: umt5_xxl_fp8_e4m3fn_scaled.safetensors
-
Load VAE
Use the Load VAE node:- VAE: wan_2.1_vae.safetensors
Step 2: Upload Image
Import the base image that will serve as the starting frame for animation and motion generation. This image provides the key pose, lighting, character design, and composition for the video.
Step 3: Provide Prompt
Define the animation behavior and camera motion using a detailed text prompt. This guides the wan2.2 module to transform the base image into a video with realistic or stylized movement.
Example Animation Prompt:
"She slowly rolls her shoulders in a sensual motion, leaning forward with a soft, surprised expression. Her mouth opens slightly, eyes wide with intrigue. The camera begins in a tight front view, then slowly zooms in while gently panning right. The scene feels intimate, with smooth motion and warm lighting enhancing her expressive movement."
Step 4: Adjust Video Length and Ratio
To match the aspect ratio of our initially uploaded image (16:9), we'll set the resolution accordingly with the correct width and height. These settings will also define the video’s length and ensure consistent quality during generation.
-
Width: 1280
-
Height: 720
-
Length: 121
-
Batch Size: 1
Step 5: High Noise Model Settings
To maximize creative variation and motion energy in the animation, we use a high noise configuration with low guidance (CFG 3.5) and full denoising. Noise is added only during the first 10 steps of the 20-step process, allowing for expressive outputs while gradually refining details in the remaining steps.
-
Add Noise: Enabled
-
Control After Generate: Randomize
-
Steps: 20
-
CFG (Classifier-Free Guidance): 3.5
-
Sampler: Euler
-
Scheduler: Beta
-
Start at Step: 0
-
Start end Step: 10
-
Return with Leftover Noise: Enabled
Step 6: Low Noise Model Settings
To balance creative variation with stability in the animation, the low noise configuration applies noise starting after step 10 and continues until the end of the process. This allows the model to first establish a stable base before introducing subtle, creative refinements in the later stages.
Low Noise Settings – Late Stage Noise Application:
-
Add Noise: Disabled (noise applied only during specified step range)
-
Control After Generate: Randomize
-
Total Steps: 20
-
CFG (Classifier-Free Guidance): 3.5
-
Sampler: Euler
-
Scheduler: Beta
-
Start at Step: 10
-
Start end Step: 10000 (max value, ensuring noise through completion)
-
Return with Leftover Noise: Disabled
This setup helps maintain initial structure and detail, while still allowing for dynamic motion and variation during the final denoising stages.
Step 7: Adjust Frames Per Second (FPS)
We will adjust the frames per second (FPS) to 30. With the video length set to 121 frames, this results in a smooth 4-second clip (121 frames ÷ 30 FPS ≈ 4 seconds).
Step 8: Adjust Final Video Settings
In the final Step 8, we’ll save the generated video using a prefix name like videoComfyUI. The output format can be set to auto or mp4 based on preference, and we’ll use the h264 codec to ensure good compatibility and compression quality.
With all these settings correctly configured, we can now click on "RUN" to watch our image transform into a dynamic video creation. We’ll showcase the results in the next section.
5. Wan2.2 Video Generation Example
After clicking RUN, the model begins generating our video example. We ran this process on an RTX 4090, which is a powerful GPU, but with the length set to 121 frames, the generation still takes some time. Keep this in mind as you experiment with longer or higher-resolution videos. Next, we’ll showcase the results and explore what this impressive FP8 Wan2.2 I2V model can create.
Example 1:
Example 2:
This beautiful woman truly knows how to express herself — and now, you have a clear vision of how the Wan2.2 image-to-video model works. It’s an impressive tool that brings still images to life with cinematic motion and emotional depth. Hopefully, this inspired you to dive in and explore the model yourself. There's a lot of creative potential waiting to be unlocked.
6. Conclusion
In conclusion, the NEW Wan2.2 Video Generation Model in ComfyUI (FP8 Version) opens up exciting possibilities for video creation and manipulation. Throughout this tutorial, we’ve covered everything from setting up the necessary requirements to loading workflows and configuring settings. By following these steps, you should now be well-equipped to use the power of the Wan2.2 model in your projects. The world of AI-driven video generation is rapidly evolving, and staying updated with the latest advancements will only enhance your creative projects. Thank you for joining us on this journey of video generation with the Wan2.2 model. We hope you found this tutorial helpful and inspiring. Happy creating!