Create Talking Avatars with SkyReels V3 in ComfyUI

February 4, 2026

ComfyUI

Learn to generate realistic talking avatars with SkyReels V3 in ComfyUI. Step-by-step guidance for creating audio-driven lip-sync from portrait images easily.

1. Introduction
2. System Requirements for SkyReels V3 Workflow
3. Download & Load the SkyReels V3 Talking Avatar Workflow
4. Running the Talking Avatar Generation Workflow
5. Lip-Syncing to Music with SkyReels V3
6. Conclusion

1. Introduction

In this tutorial, you’ll learn how to generate talking avatars from static portrait images using SkyReels V3 in ComfyUI. This audio-to-video (A2V) workflow drives facial motion directly from speech, producing synchronized lip movement, head motion, and expressive micro-animations.

Unlike traditional animation pipelines that rely on manual keyframes or face-tracking rigs, SkyReels V3 uses a diffusion-based motion model conditioned on a reference image and audio input. The model predicts temporally consistent facial motion while preserving identity from the source image.

By the end of this guide, you’ll be able to turn a single portrait image and an audio clip into a coherent talking avatar entirely inside ComfyUI, using Kijai’s optimized FP8 workflow for efficient generation.

RunPod Special Offer

Load $10, get up to $500 in bonus credits randomly!

2. System Requirements for SkyReels V3 Workflow

Before generating talking avatars with SkyReels V3, ensure your system meets the necessary hardware and software requirements. This workflow benefits significantly from a powerful GPU with sufficient VRAM — we recommend at least an RTX 4090 (24GB VRAM) for optimal performance, or using a cloud GPU provider like RunPod.

Requirement 1: ComfyUI Installed & Updated

You'll need ComfyUI installed and running, either locally or through a cloud service. For local Windows installation, follow this comprehensive guide:

👉 How to Install ComfyUI Locally on Windows

Once installed, navigate to the Manager tab in ComfyUI and click "Update ComfyUI" to ensure you're running the latest version. Keeping ComfyUI updated is essential for compatibility with the latest models, custom nodes, and workflow features that SkyReels V3 requires.

While SkyReels V3 can run locally with adequate hardware, we strongly recommend using the Next Diffusion - ComfyUI SageAttention template on RunPod. Here's why:

Pre-optimized Environment — Sage Attention and Triton acceleration come pre-installed and configured, dramatically improving generation speed and VRAM efficiency.
Zero Setup Friction — No need to manually install CUDA libraries, PyTorch dependencies.
Persistent Storage — Network Volume support ensures your models, workflows, and generated content are saved between sessions, eliminating repeated downloads.

Spin up a production-ready ComfyUI instance in minutes using our RunPod template:

👉 How to Run ComfyUI on RunPod with Network Volume

Requirement 2: Download SkyReels V3 Model Files

The SkyReels V3 Talking Avatar Workflow relies on a specialized set of models designed for audio-driven animation. These include the core SkyReels V3 A2V diffusion model, MelBandRoformer audio processor, VAE encoder, and text encoder for prompt guidance.

Download each of the following models and place them in their respective ComfyUI model directories exactly as specified below:

File Name	Hugging Face Download Page	File Directory
Wan21-SkyReelsV3-A2V_fp8_scaled_mixed.safetensors	🤗 Download	..\ComfyUI\models\diffusion_models
MelBandRoformer_fp16.safetensors	🤗 Download	..\ComfyUI\models\diffusion_models
Wan2_1_VAE_bf16.safetensors	🤗 Download	..\ComfyUI\models\vae
umt5-xxl-enc-bf16.safetensors	🤗 Download	..\ComfyUI\models\text_encoders

Once all models are downloaded and placed correctly, ComfyUI will automatically detect them on startup. This ensures the SkyReels V3 audio-to-video nodes load properly and can process your reference images and audio seamlessly.

Requirement 3: Verify Folder Structure

Before running the SkyReels V3 workflow, confirm that all downloaded models are organized in the correct ComfyUI subdirectories. Your folder structure should look exactly like this:

ts
1📁 ComfyUI/
2└── 📁 models/
3    ├── 📁 diffusion_models/
4    │   ├── Wan21-SkyReelsV3-A2V_fp8_scaled_mixed.safetensors
5    │   ├── MelBandRoformer_fp16.safetensors
6    ├── 📁 vae/
7    │   └── Wan2_1_VAE_bf16.safetensors
8    ├── 📁 text_encoders/
9    │   └── umt5-xxl-enc-bf16.safetensors

With everything properly installed and organized, you're ready to load the SkyReels V3 Talking Avatar Workflow and start generating realistic, audio-synchronized animations from your portrait images.

3. Download & Load the SkyReels V3 Talking Avatar Workflow

Now that your environment and models are set up, it's time to load and configure the SkyReels V3 Talking Avatar Workflow in ComfyUI. This workflow integrates all the necessary components — diffusion models, audio processors, VAE, and text encoders — into a streamlined pipeline for generating expressive, lip-synced talking avatars from a single reference image and audio clip.

Load the SkyReels V3 Workflow JSON File

👉 Download the SkyReels V3 Talking Avatar Workflow JSON file and drag it directly into your ComfyUI canvas.

Uploaded image

This workflow comes fully pre-configured with all essential nodes, model references, and audio processing components required for realistic lip-sync generation driven by your input audio.

Install Missing Nodes

If any nodes appear highlighted in red, it means certain custom nodes are missing from your ComfyUI installation.

To resolve this:

Open the Manager tab in ComfyUI.
Click Install Missing Custom Nodes.
After installation completes, restart ComfyUI to activate the changes.

This ensures all SkyReels V3-specific nodes and the MelBandRoFormer audio processing components are properly installed and ready to handle your talking avatar generation.

Once all nodes load successfully without errors, you're ready to upload your reference image and audio file, configure your prompt, and generate your first talking avatar with SkyReels V3.

RunPod Special Offer

Load $10, get up to $500 in bonus credits randomly!

4. Running the Talking Avatar Generation Workflow

With the workflow loaded and all components in place, you're ready to generate your first talking avatar using SkyReels V3 in ComfyUI. This section walks you through uploading your reference image, loading audio, setting parameters, and configuring prompts to produce smooth, expressive results with perfect lip-sync.

Upload Your Reference Image

Start by loading your reference image into the Image Loader node. This portrait will serve as the visual foundation for your talking avatar, with SkyReels V3 automatically animating facial features based on your audio input. We'll use the folling reference input image:

Uploaded image

The quality of your reference image directly impacts the realism of the generated animation. Avoid heavily filtered or low-resolution images for optimal lip-sync accuracy.

Set Video Dimensions and Aspect Ratio

Define your output video dimensions in the Resize Image or Set Resolution node. SkyReels V3 works well with various aspect ratios, but for this tutorial, we'll use a vertical 9:16 portrait format — ideal for social media platforms like Instagram, TikTok, and YouTube Shorts.

Recommended settings for this tutorial (vertical portrait):

Setting	Value	Notes
Width	480	Standard vertical portrait width
Height	832	Maintains 9:16 aspect ratio
Aspect Ratio	9:16	Perfect for social media content

This 480×832 resolution balances generation speed with visual quality, making it ideal for learning the workflow and creating shareable content.

Load Your Audio File

Import your audio clip into the Audio Loader node. SkyReels V3 will analyze the audio waveform, extracting phoneme patterns, timing, and amplitude information to drive accurate mouth movements and facial expressions frame-by-frame. We'll use the following audio fragment:

0:00

Set Frame Count and Duration

In the Max Frame Settings node, set the max frames parameter. This acts as an upper limit for generation — the workflow will automatically generate frames to match your audio length, but will stop at the maxframes value if your audio is longer.

Recommended setting: Leave max_frames at 500 (the default).

At 24 fps (the recommended frame rate for SkyReels V3), this allows for:

Max Frames	Maximum Video Length
500	~20 seconds at 24 fps

How it works:

If your audio is shorter than max_frames allows, the workflow generates exactly enough frames to match your audio length
If your audio is longer than max_frames allows, generation stops at 500 frames and the audio is truncated

Configure Your Text Prompt

One of the unique features of SkyReels V3 is the ability to guide the animation using text prompts. While the primary driving force is the audio (for lip-sync), the prompt helps shape:

Overall mood and emotional tone
Head movement and posture
Subtle facial expressions
Animation style and energy

Example prompts:

"The camera quickly zooms in on the woman's face as she speaks"

The prompt provides context that complements the audio-driven lip-sync, creating a more cohesive and natural-looking talking avatar.

👉 Tip: Keep prompts focused on visible facial behavior and expression. Avoid overly detailed descriptions that might conflict with the audio-driven motion.

Run the Generation

Once your reference image, audio file, num_frames, and prompt are configured, click Queue Prompt to start the generation process.

SkyReels V3 will:

Process your audio to extract speech features
Analyze your reference image
Generate frame-by-frame animations with synchronized lip movements
Apply prompt-guided expressions and subtle motion
Output a complete video file

For faster, more cost-effective generation, we highly recommend using RunPod with GPU rental.

Once complete, you'll have a realistic talking avatar at 480×832 resolution (9:16 aspect ratio) with natural lip-sync, expressive facial movements, and audio-synchronized animation ready for your project.

5. Lip-Syncing to Music with SkyReels V3

While SkyReels V3 excels at creating talking avatars with speech audio, it can also generate impressive lip-sync animations to music and singing. Simply upload a music track or vocal recording instead of spoken dialogue, and the workflow will analyze the audio patterns to create synchronized mouth movements.

This opens up creative possibilities for:

Music video avatars
Singing character animations
Lyric visualization content
Artistic projects and creative storytelling

The process is identical — just load your music file into the Audio Loader node instead of speech audio, and SkyReels V3 will handle the rest.

Below is an example of lip-syncing to music:

The same workflow, different creative output. Whether you're working with dialogue, narration, or music, SkyReels V3 adapts to your audio input.

RunPod Special Offer

Load $10, get up to $500 in bonus credits randomly!

6. Conclusion

Congratulations! You've now mastered the complete workflow for creating realistic talking avatars with SkyReels V3 in ComfyUI. You've learned how to:

Set up your environment and install the necessary models
Load and configure the SkyReels V3 workflow
Upload reference images and audio files
Configure prompts and parameters for optimal results
Generate professional-quality talking avatars with perfect lip-sync

SkyReels V3 represents a powerful advancement in audio-driven video generation, making it possible to create expressive, natural-looking talking avatars without manual animation or expensive software. Whether you're producing virtual presenters, educational content, social media videos, marketing materials, or creative storytelling projects, this workflow provides the tools you need to bring static portraits to life with realistic speech and emotion.

The combination of reference image flexibility, audio-driven automation, and prompt-based control gives you unprecedented creative freedom. You can now transform any portrait into a speaking character that feels authentic and engaging.

Now it's your turn to experiment. Try different portrait styles, test various audio clips, and craft prompts that guide your avatars toward specific moods and expressions. With SkyReels V3 and ComfyUI, the only limit is your creativity.

Create Talking Avatars with SkyReels V3 in ComfyUI

Table of Contents

1. Introduction

RunPod Special Offer

2. System Requirements for SkyReels V3 Workflow

Requirement 1: ComfyUI Installed & Updated

Requirement 2: Download SkyReels V3 Model Files

Requirement 3: Verify Folder Structure

3. Download & Load the SkyReels V3 Talking Avatar Workflow

Load the SkyReels V3 Workflow JSON File

Install Missing Nodes

RunPod Special Offer

4. Running the Talking Avatar Generation Workflow

Upload Your Reference Image

Set Video Dimensions and Aspect Ratio

Load Your Audio File

Set Frame Count and Duration

Configure Your Text Prompt

Run the Generation

5. Lip-Syncing to Music with SkyReels V3

RunPod Special Offer

6. Conclusion

Frequently Asked Questions

Explore More Tutorials

Vision-Powered Prompting with Qwen 3VL in ComfyUI

Audio-Driven Image-to-Video with LTX-2 in ComfyUI

Generate Uncensored Images

Run ComfyUI in the Cloud with Ease

Create Talking Avatars with SkyReels V3 in ComfyUI

Table of Contents

1. Introduction

RunPod Special Offer

2. System Requirements for SkyReels V3 Workflow

Requirement 1: ComfyUI Installed & Updated

Requirement 2: Download SkyReels V3 Model Files

Requirement 3: Verify Folder Structure

3. Download & Load the SkyReels V3 Talking Avatar Workflow

Load the SkyReels V3 Workflow JSON File

Install Missing Nodes

RunPod Special Offer

4. Running the Talking Avatar Generation Workflow

Upload Your Reference Image

Set Video Dimensions and Aspect Ratio

Load Your Audio File

Set Frame Count and Duration

Configure Your Text Prompt

Run the Generation

5. Lip-Syncing to Music with SkyReels V3

RunPod Special Offer

6. Conclusion

Frequently Asked Questions

What is SkyReels V3 and how does it differ from other lip-sync tools?

Can I use SkyReels V3 for commercial projects?

How long does it take to generate a talking avatar video?

Explore More Tutorials

Vision-Powered Prompting with Qwen 3VL in ComfyUI

Audio-Driven Image-to-Video with LTX-2 in ComfyUI

Generate Uncensored Images

Run ComfyUI in the Cloud with Ease