FLOAT in ComfyUI: Audio-Driven Motion for Talking Portraits

June 15, 2025
ComfyUI
FLOAT in ComfyUI: Audio-Driven Motion for Talking Portraits
Learn how to use FLOAT in ComfyUI for audio-driven motion generation in talking portraits with this step-by-step tutorial on latent flow matching techniques.

1. Introduction to ComfyUI FLOAT

ComfyUI FLOAT is a powerful wrapper for the FLOAT model, designed to bring audio-driven talking portraits to life using Generative Motion Latent Flow Matching. By syncing voice input with expressive animations, it enables the creation of dynamic and engaging audiovisual content. Whether you're a developer, content creator, or simply passionate about AI and multimedia, ComfyUI FLOAT offers a user-friendly interface that makes integrating audio-driven animation into your workflow simple and accessible.

In this tutorial, we’ll walk you through the complete setup and usage of ComfyUI FLOAT. If you haven’t installed ComfyUI yet, be sure to check out our step-by-step guide on installing it locally on Windows.

How to Install ComfyUI Locally on Windows?

Once you’re up and running, you’ll explore everything from loading the FLOAT workflow to real-world examples—equipping you with the tools and knowledge to fully leverage this innovative animation solution.

2. Loading the FLOAT Workflow in ComfyUI

To get started with ComfyUI FLOAT, the first step is to load the FLOAT workflow into your ComfyUI interface. Once you have the FLOAT Workflow JSON file downloaded, open your ComfyUI application. You can easily load the workflow by dragging the downloaded workflow JSON file into the ComfyUI canvas. Upon doing this, you may encounter a popup message indicating "Missing Node Types". The following node types were not found

  • LoadFloatModels

  • FloatProcess

It's normal to encounter this issue when loading the workflow for the first time. As you explore, you may see certain nodes highlighted in red—this means they're not yet installed.

Don’t worry! We'll walk you through the installation process in the next section. Successfully loading the FLOAT workflow brings you one step closer to creating compelling, audio-driven animations.

3. Installing Missing Nodes for FLOAT

If you see red outlines around nodes in your ComfyUI FLOAT workflow, it means some custom nodes are missing. Follow these steps to resolve the issue:

  1. Open ComfyUI
    Launch your ComfyUI interface in your browser.

  2. Access the Node Manager
    Navigate to the top right corner and click on the "Manager" button.

  3. Install Missing Custom Nodes
    In the Node Manager window, find and click on "Install missing custom nodes". This will scan your setup for any missing components.

  4. Locate and Install “ComfyUI_Float”


    Look for the node named "ComfyUI Float" in the list. Click the Install button next to it.

  5. Restart the Server
    After installation, click the Restart button in the bottom left corner and confirm the action.

  6. Wait for Reconnection
    You’ll see a “Reconnecting” popup in the top right. Wait for the server to fully reboot.

  7. Refresh Your Browser
    Once the server is back online, refresh the browser tab.

The red outlines should now be gone, and you’re ready to explore the full capabilities of ComfyUI FLOAT.

4. Exploring Audio-driven Talking Portraits with FLOAT

Now that you've successfully installed the necessary nodes, you're ready to explore how to create audio-driven talking portraits using ComfyUI FLOAT. This process involves a few essential steps to generate animations that sync seamlessly with audio input.

### Step 1: Set the Frame Rate (FPS)

Before uploading any assets, start by setting your desired FPS (Frames Per Second) in the FLOAT workflow.

  • A higher FPS gives better lip-sync accuracy and smoother motion.

  • We recommend starting at 40 FPS, which balances quality and speed.

  • Feel free to experiment with different values to find what works best for your use case.

Step 2: Upload a Front-Facing Portrait Image

Next, drag and drop a portrait image into the workflow.

  • Make sure it’s a frontal view with a clearly visible face and mouth.

  • Non-frontal or low-quality images may lead to suboptimal animation results.

Step 3: Upload Your Audio File

Now upload the audio clip you want the model to sync with the image.

  • This can be a voice recording, narration, or sound clip.

  • Ensure the audio is clear and free from heavy background noise for best results.

Step 4: Understand and Adjust Key FLOAT Settings

In the FLOAT pipeline, three key parameters influence how your talking portrait is generated:

  • a_cfg_scale (Audio Condition Guidance Scale):
    Controls how much the audio drives the animation.

    • Recommended value: 2

    • Higher values improve lip-sync but may reduce naturalness.

  • r_cfg_scale (Randomness Parameter):
    Determines variability in the output.

    • Recommended value: 1

    • Higher values make results less consistent.

  • e_cfg_scale (Emotion Guidance Scale):
    Enhances emotional expression based on the audio tone.

    • Recommended value: 1 (neutral)

    • Higher values give more expressive results. (Try between 5 and 10 for more expressive results)

Step 5: Generate the Animation

After all assets are loaded and the settings are configured:

  • Click "Run" in the ComfyUI interface.

  • On the first run, FLOAT will automatically download the Float.pth model file—make sure you have sufficient disk space available. This initial render may take longer due to the download, but subsequent renders will be significantly faster.

Once loaded, FLOAT will analyze your inputs and generate a fully animated talking portrait that mirrors both the speech and emotional tone of the audio.

5. FLOAT Audio Driven Motion Portrait Example

This Float example was rendered on an RTX 3060 GPU with 12GB of VRAM and performs flawlessly. For instance, processing a 14-second audio clip along with a face portrait upload took approximately 20 seconds. This should give you a rough idea of the processing time. Note that the first render may take a bit longer due to the initial model loading.

FLOAT SETTINGS:

  • FPS: 60

  • a_cfg_scale: 2

  • r_cfg_scale: 1

  • e_cfg_scale: 1

  • emotion: none

  • crop: True (If set to false, the output becomes blurry and doesn't work properly)

  • control after generate: random

You can play around with the settings if you wish, including experimenting with different FPS values and emotion presets like happy, sad, surprise, fear, disgust, or angry. Just keep in mind that if the crop function is set to False, it doesn’t work as expected and the output will remain at a cropped 512x512 ratio.

6. Conclusion & Limitations: ComfyUI FLOAT

ComfyUI FLOAT opens up a powerful and creative way to generate audio-driven talking portraits. With a simple setup process and a user-friendly interface, it enables developers, creators, and enthusiasts to bring still images to life through synchronized speech and expressive facial animation. This tutorial guided you through setting up the workflow, installing missing nodes, and running your first animated portrait.

It's important to know that the final output from FLOAT is a cropped 512x512 video that focuses specifically on the face region of your original image. Despite this constraint, FLOAT is surprisingly fast and effective. It delivers high-quality results with minimal resources and setup time. You're encouraged to play around with the available parameters, like emotion scaling and audio configuration, to fine-tune results for your specific use case. Whether for prototyping, content creation, or just experimenting, ComfyUI FLOAT is a flexible and fun tool to explore. Give it a try and see how far you can push your animated portraits.

Frequently Asked Questions

AI Video Generation

Create Amazing AI Videos

Generate stunning videos with our powerful AI video generation tool.

Get Started Now
OR