Audio-Driven Image-to-Video with LTX-2 in ComfyUI

January 14, 2026

ComfyUI

Learn to animate images into videos with your own audio using LTX2 in ComfyUI. Full tutorial, folder setup, models, and workflow included.

1. Introduction to LTX-2 Image-to-Video in ComfyUI
2. System Requirements & Model Setup for LTX‑2 Video Generation
3. Download & Load the LTX‑2 Image-to-Video Workflow
4. Running the LTX‑2 Image-to-Video Audio Workflow
5. Another LTX2 Audio Driven Image to Video Example
6. Conclusion: From Image to Video with Sound

1. Introduction to LTX-2 Image-to-Video in ComfyUI

Creating a video from a static image with your own audio is now simple with LTX-2 in ComfyUI. This tutorial will show you how to generate high-quality, animated videos using a single reference image and a custom audio track. LTX-2 is a cutting-edge audio-video diffusion model that produces smooth, coherent video sequences directly from your image and prompt, making it perfect for storytelling, character animation, or music-driven projects. We’ll cover everything from system requirements and model setup to workflow tips, giving you all the tools you need to start turning still images into dynamic videos with synchronized audio.

RunPod Special Offer

Load $10, get up to $500 in bonus credits randomly!

2. System Requirements & Model Setup for LTX‑2 Video Generation

Before generating videos, ensure your system meets the requirements to run the LTX‑2 Image-to-Video workflow with custom audio smoothly. A high-end GPU is recommended — ideally an RTX 5090 (48GB+ VRAM) or a cloud GPU provider like RunPod, as LTX‑2 uses significant memory for video and audio diffusion.

Requirement 1: ComfyUI Installed

You’ll need ComfyUI installed either locally or on a cloud GPU service.

Local Windows installation: Follow this guide:
👉 How to Install ComfyUI Locally on Windows
Cloud GPU (e.g., RunPod): If your GPU is limited, you can run ComfyUI in the cloud using a persistent network volume. Step-by-step instructions are available here:
👉 How to Run ComfyUI on RunPod with Network Volume

Requirement 2: Update ComfyUI

Keeping ComfyUI updated ensures full compatibility with the latest workflows, nodes, and features.

For Windows Portable Users:

Open the folder: ...\ComfyUI_windows_portable\update
Double-click update_comfyui.bat

For RunPod Users:

ts
1 cd /workspace/ComfyUI && git pull origin master && pip install -r requirements.txt && cd /workspace

💡Keeping ComfyUI updated guarantees you have the latest features, bug fixes, and node compatibility.

Requirement 2: Download LTX‑2 Model Files

LTX‑2 requires several model files for video diffusion, audio conditioning, and LoRA enhancements. Download and place them in the correct directories as shown below:

File Name	Download Page	File Directory
ltx-2-19b-dev-fp8.safetensors	🤗 HuggingFace	..\ComfyUI\models\checkpoints
MelBandRoformer_fp16.safetensors	🤗 HuggingFace	..\ComfyUI\models\diffusion_models
ltx-2-19b-distilled-lora-384.safetensors	🤗 HuggingFace	..\ComfyUI\models\loras
ltx-2-19b-ic-lora-detailer.safetensors	🤗 HuggingFace	..\ComfyUI\models\loras
ltx-2-19b-lora-camera-control-dolly-in.safetensors	🤗 HuggingFace	..\ComfyUI\models\loras

Requirement 3: Verify Folder Structure

After downloading all models, your ComfyUI folder structure should look like this:

ts
1📁 ComfyUI/
2└── 📁 models/
3    ├── 📁 checkpoints/
4    │   └── ltx-2-19b-dev-fp8.safetensors
5    ├── 📁 diffusion_models/
6    │   └── MelBandRoformer_fp16.safetensors
7    └── 📁 loras/
8        ├── ltx-2-19b-distilled-lora-384.safetensors
9        ├── ltx-2-19b-ic-lora-detailer.safetensors
10        └── ltx-2-19b-lora-camera-control-dolly-in.safetensors

Once all models are downloaded and correctly placed, you’re ready to load the LTX‑2 workflow in ComfyUI and start generating high-quality videos with synchronized audio. Thanks to the distilled LoRA, you can use 8-step sampling, reducing runtime while keeping quality high. Because LTX‑2 is memory-intensive, an RTX 5090 or cloud GPU like RunPod ensures smooth generation, especially for longer videos or higher resolutions.

Next, we’ll download the actual workflow file and prepare it for your first video generation session.

3. Download & Load the LTX‑2 Image-to-Video Workflow

Now that your environment is set up and all required model files are in the correct folders, it’s time to load and configure the LTX‑2 Image-to-Video workflow in ComfyUI. This workflow brings together the LTX‑2 diffusion model, MelBandRoFormer audio model, and the optional LoRAs — ensuring everything works together for smooth, audio-driven video generation. Once loaded, you’ll be ready to start creating your first animated video.

Load the LTX‑2 workflow JSON file:

👉 Download the LTX‑2 Image-to-Video Audio workflow JSON file and drag it directly onto your ComfyUI canvas

Uploaded image

This workflow includes all essential nodes, file references, and audio-driven components pre-arranged for reliable, synchronized video generation. In the next section, we’ll look at running the workflow and generating your first video.

RunPod Special Offer

Load $10, get up to $500 in bonus credits randomly!

4. Running the LTX‑2 Image-to-Video Audio Workflow

Now that your workflow is loaded, it’s time to generate your first video using LTX‑2. The workflow is straightforward — everything is handled through a few key nodes, making it easy to turn a single image and audio track into a full video sequence.

Step 1: Upload Your Reference Image

In the Image group:

Upload your reference image. We'll start with the following:
Set the Width and Height — this will determine the video resolution. Important: dimensions must be divisible by 32.
Set the Length — the total number of frames for your video. For example, if you are generating at 30fps and want a 10-second video, set Length = 300 frames.

Setting	Description	Example
Width	Video width in pixels	1280
Height	Video height in pixels	704
Length	Total frames for the video	297 (for 10s at 30fps)

Step 2: Upload Your Audio

Next, upload your custom audio track to the Audio group. We'll be using the first part of this song:

0:00

You can also trim the audio if you only want a specific section:

Setting	Description	Example
start_index	Starting point of the audio clip (seconds)	10
duration	Length of the audio clip to use (seconds)	10

💡 Above settings will only use the portion from 10–20 seconds of the song.

Step 3: Enter Your Prompt

In the Prompt field, describe your video in detail. LTX‑2 recommends specifying:

Camera movement: Where the camera starts and ends
Object/subject motion: Movements, rotations, gestures
Environment & style: Lighting, background, mood, aesthetics

The more detailed your prompt, the more coherent and expressive the final video will be.

Step 4: Set Frame Rate (FPS)

Next to the prompt, set the frame rate (FPS). We recommend:

Setting	Value	Description
FPS	30	Smooth video playback for most projects

💡 Tip: Higher FPS = smoother motion but longer generation time.

Step 5: Final Check – Models, LoRAs & Sampler Settings

Before running the workflow, take a moment to verify that all models, LoRAs, and sampler settings are configured correctly. This step helps avoid unnecessary re-runs and ensures optimal quality.

Sampler Settings

Set the sampler configuration as follows:

Setting	Value
Sampler	Euler
Scheduler	Simple
Steps	8

The 8-step setup works especially well when using the distilled LTX-2 LoRA, offering fast generation with stable results.

Model Selection

Make sure the following models are selected in their respective nodes:

Component	Model
Diffusion Model	ltx-2-19b-dev-fp8.safetensors
Audio Model	MelBandRoformer_fp16.safetensors

LoRA Configuration

Enable the required LoRAs and set their strengths carefully:

LoRA	Recommended Strength
Distilled LoRA (384)	0.6
Camera Control LoRA	0.1–1 (optional)
Detailer LoRA	0.1–1 (optional)

💡 Important: While the distilled LoRA can be set to 1.0, a strength of 0.6 consistently produces better quality and stability, especially at low step counts.

Step 6: Run the Workflow

Once you’ve set image, audio, prompt, and FPS, your workflow is ready to run! Here's the result:

Feel free to experiment with different images, music segments, and camera motions to better understand how LTX-2 responds to various creative setups.

5. Another LTX2 Audio Driven Image to Video Example

For this clip, we used a short audio sample paired with a static reference image and subtle camera motion. The result is a short, atmospheric video driven entirely by the audio cue and prompt.

Do you recognize where this audio is from? 👀

RunPod Special Offer

Load $10, get up to $500 in bonus credits randomly!

6. Conclusion: From Image to Video with Sound

LTX-2 makes it possible to turn a single image and an audio track into a coherent, motion-aware video directly inside ComfyUI. By combining image reference, sound input, and detailed motion prompts, you can generate videos that feel intentional and synchronized rather than random or purely visual. With the distilled LoRA enabled, low step counts, and the correct sampler settings, LTX-2 delivers strong results while keeping generation times reasonable—even for longer clips.

Running this workflow on high-VRAM GPUs like the RTX 5090 ensures stable performance, especially when working with higher resolutions and longer frame counts. Platforms like RunPod make this setup accessible without the hassle of local configuration, allowing you to focus entirely on experimentation and creative output.

Whether you’re creating music-matched visuals, animated characters, or cinematic image sequences, LTX-2 provides a flexible and production-ready image-to-video pipeline. With optional upscaling using FlashVSR, your final videos can be pushed even further in quality, making LTX-2 a powerful addition to any modern AI video workflow in ComfyUI.

Audio-Driven Image-to-Video with LTX-2 in ComfyUI

Table of Contents

1. Introduction to LTX-2 Image-to-Video in ComfyUI

RunPod Special Offer

2. System Requirements & Model Setup for LTX‑2 Video Generation

Requirement 1: ComfyUI Installed

Requirement 2: Update ComfyUI

Requirement 2: Download LTX‑2 Model Files

Requirement 3: Verify Folder Structure

3. Download & Load the LTX‑2 Image-to-Video Workflow

Load the LTX‑2 workflow JSON file:

RunPod Special Offer

4. Running the LTX‑2 Image-to-Video Audio Workflow

Step 1: Upload Your Reference Image

Step 2: Upload Your Audio

Step 3: Enter Your Prompt

Step 4: Set Frame Rate (FPS)

Step 5: Final Check – Models, LoRAs & Sampler Settings

Sampler Settings

Model Selection

LoRA Configuration

Step 6: Run the Workflow

5. Another LTX2 Audio Driven Image to Video Example

RunPod Special Offer

6. Conclusion: From Image to Video with Sound

Frequently Asked Questions

Explore More Tutorials

Consistent Z-Image-Turbo Images with ControlNet in ComfyUI (T2I)

How to Deploy and Use The Hub on Next Diffusion

Generate Uncensored Images

Run ComfyUI in the Cloud with Ease

Audio-Driven Image-to-Video with LTX-2 in ComfyUI

Table of Contents

1. Introduction to LTX-2 Image-to-Video in ComfyUI

RunPod Special Offer

2. System Requirements & Model Setup for LTX‑2 Video Generation

Requirement 1: ComfyUI Installed

Requirement 2: Update ComfyUI

Requirement 2: Download LTX‑2 Model Files

Requirement 3: Verify Folder Structure

3. Download & Load the LTX‑2 Image-to-Video Workflow

Load the LTX‑2 workflow JSON file:

RunPod Special Offer

4. Running the LTX‑2 Image-to-Video Audio Workflow

Step 1: Upload Your Reference Image

Step 2: Upload Your Audio

Step 3: Enter Your Prompt

Step 4: Set Frame Rate (FPS)

Step 5: Final Check – Models, LoRAs & Sampler Settings

Sampler Settings

Model Selection

LoRA Configuration

Step 6: Run the Workflow

5. Another LTX2 Audio Driven Image to Video Example

RunPod Special Offer

6. Conclusion: From Image to Video with Sound

Frequently Asked Questions

How do I create image-to-video animations with custom audio using LTX-2?

Can LTX-2 generate videos that sync motion to music or audio?

What are the best settings for high-quality LTX-2 video generation?

Explore More Tutorials

Consistent Z-Image-Turbo Images with ControlNet in ComfyUI (T2I)

How to Deploy and Use The Hub on Next Diffusion

Generate Uncensored Images

Run ComfyUI in the Cloud with Ease