ChatterBox for ComfyUI: Text-to-Speech, Voice Cloning & Conversion

June 6, 2025

ComfyUI

Discover how to use ChatterBox in ComfyUI to unlock powerful voice features — including text-to-speech (TTS), voice cloning, and voice conversion. Learn more!

1. Introduction to ChatterBox in ComfyUI
2. Loading the ChatterBox Workflow in ComfyUI
3. Installing Missing Nodes for ChatterBox
4. Exploring TTS with Voice Cloning Capabilities
5. Exploring Chatterbox Voice Conversion (VC) with Example
6. Limitations of ChatterBox
7. Conclusion: Harnessing the Power of ChatterBox in ComfyUI

1. Introduction to ChatterBox in ComfyUI

ChatterBox is an innovative custom node extension for ComfyUI that enhances the user experience by integrating Text-To-Speech (TTS) and Voice Conversion (VC) capabilities. Utilizing the powerful Chatterbox library, this extension allows users to generate realistic speech from text and convert voices with remarkable accuracy.

If you haven’t installed ComfyUI yet, check out our detailed guide below on how to set it up locally:

👉 How to Install ComfyUI Locally on Windows?

One of ChatterBox’s most impressive features is its ability to create custom voices through voice cloning, making it a powerful tool for developers, content creators, and anyone working with voice synthesis. However, there are a few limitations to keep in mind: speech generation is currently capped at 40 seconds, and the model performs best in English only.

In this blog post, we’ll walk you through how to set up ChatterBox in ComfyUI, explore its key features, and help you understand both its capabilities and current limitations.

Transform Text Into Speech Instantly

Generate High-Quality Speech with Voice Cloning

2. Loading the ChatterBox Workflow in ComfyUI

To get started with ChatterBox in ComfyUI, begin by loading the workflow into your interface. Download the required .json file from the following link: ChatterBox ComfyUI Workflow. Once downloaded, open ComfyUI and drag the file into the canvas. You may see a popup message like "Missing Node Types" — this is expected and will be addressed shortly.

The workflow is organized into two main sections: one for Text-to-Speech (TTS) and the other for Voice Conversion (VC). This layout helps you quickly navigate and utilize ChatterBox’s advanced voice capabilities.

As you explore the workflow, you may notice some red outlines around the nodes labeled "FL_ChatterboxVC" and "FL_ChatterboxTTS". This indicates that these nodes are not yet properly installed, which we will address in the next section.

3. Installing Missing Nodes for ChatterBox

If you see red outlines around the ChatterBox nodes in ComfyUI, it means some custom nodes are missing. Follow these steps to install them and fix the issue:

Open your ComfyUI interface and go to the top right corner.
Click on the “Manager” option to open the Node Manager.
In the Node Manager, find and select “Install missing custom nodes.”
Locate the missing node labeled “ComfyUI_Fill-ChatterBox.”
Click the install button next to this node to start the installation.
After installation, you’ll be prompted to restart your server.
Click the restart button in the bottom left corner and confirm.
Wait for the server to reboot; a “Reconnecting” popup will appear in the top right corner.
When prompted, click confirm to refresh your browser.
After refreshing, the red outlines around the ChatterBox nodes should disappear, indicating a successful installation.

Once completed, the red outlines around the ChatterBox nodes will disappear, confirming the missing nodes have been installed successfully. You’re now ready to explore ChatterBox’s full range of TTS, voice cloning, and voice conversion features in ComfyUI. Let’s dive in together!

Transform Text Into Speech Instantly

Generate High-Quality Speech with Voice Cloning

4. Exploring TTS with Voice Cloning Capabilities

Now that the necessary nodes are installed, let’s explore how to use ChatterBox’s voice cloning and TTS features step-by-step. We’ll also explain the key settings to help you get the best results.

How to Use ChatterBox TTS with Voice Cloning

Uploaded image 1. Upload a voice sample

Enter your audio prompt
Click “Run”

Chatterbox TTS Settings and Explanation

Audio Prompt: "Well hello there… big boy. You're hearing the voice of Salma Hayek. Soft… slow… and dripping with everything your ears desire. Fully AI… but oh, I sound real enough to make you lean in closer, don’t I? Do you like it? Mmm… I know you do. Let’s take our time… darling."
Exaggeration: 0.5 (range 0.25–2)
Controls how expressive the voice sounds. Higher values produce more dramatic, intense speech; lower values keep it natural and calm.
CFG Weight: 0.5 (range 0.2–1.0)
Determines how closely the speech follows your text prompt. Higher values mean the voice sticks more closely to the words you wrote; lower values allow for more variation and creativity.
Temperature: 0.8 (range 0.05–5)
Affects speech speed and randomness. Higher values make the speech faster and more unpredictable, which can reduce clarity. Lower values slow down the speech and make it clearer.

Chatterbox TTS Example (Salma Hayek)

Using just a 2-second clip of Salma Hayek’s voice and the audio prompt, the generated speech is impressively realistic and captures the sensual tone perfectly. It’s a clear demonstration of how powerful ChatterBox’s TTS capabilities can be—even with minimal input.

0:00

In the next chapter, we’ll dive into the ChatterBox Voice Conversion (VC) nodes, where you’ll learn how to transform one voice into another using the intuitive workflow.

5. Exploring Chatterbox Voice Conversion (VC) with Example

In addition to TTS, ChatterBox also offers powerful voice conversion capabilities that let you transform one voice into another, opening up exciting possibilities for creating diverse audio content. To use the voice conversion feature, follow these steps:

Uploaded image - Input an original audio sample — upload the voice you want to convert (referred to as input_audio).

Select a target voice — choose the voice you want to convert the original audio into; this is the voice that will be cloned and applied to the input. (referred to as target_audio)
Run the conversion — ChatterBox applies its voice conversion algorithms to generate an output audio clip that maintains the original content but takes on the vocal characteristics of the target voice.

Original Audio (Input_Audio):

For example, you start with a male speaker’s audio file as the original input like below:

0:00

Target Voice (Salma Hayek):

Then, you select a short audio clip of Salma Hayek’s voice as the target voice you want to convert to.

0:00

Final Output Audio:

The output will be the original male speech transformed to sound like it’s spoken by Salma Hayek, combining the content of the original with the unique vocal qualities of the target.

0:00

This feature is especially valuable for voice actors, game developers, and creators seeking to produce dynamic, high-impact voice transformations that stand out in their projects.

In the next section, we’ll go over a few important limitations to keep in mind when working with ChatterBox.

Transform Text Into Speech Instantly

Generate High-Quality Speech with Voice Cloning

6. Limitations of ChatterBox

While ChatterBox delivers impressive results in both TTS and voice conversion, it’s important to be mindful of its current limitations. Most notably, the speech generation is capped at a maximum of 40 seconds. Exceeding this may result in reduced clarity, distorted audio, or inconsistent quality—common challenges in longer voice synthesis.

Another important point: ChatterBox performs best with English-language voices and text. While it may technically process other languages, the most natural, coherent, and expressive results come from English inputs and voice samples.

By staying within these boundaries—shorter durations and English content—you’ll get the most reliable and high-quality performance from the ChatterBox nodes within ComfyUI.

7. Conclusion: Harnessing the Power of ChatterBox in ComfyUI

In conclusion, ChatterBox is a powerful and intuitive extension for ComfyUI, unlocking advanced text-to-speech (TTS) and voice conversion (VC) capabilities. With support for custom voice cloning and seamless audio transformation, it empowers creators to build rich, voice-driven experiences like never before.

Whether you're a developer, content creator, or voice tech enthusiast, this guide gives you everything you need to get started quickly. ChatterBox combines ease of use with powerful features, making it the perfect sandbox for exploring synthetic speech. For the best results, keep inputs under 40 seconds and stick to English for the most natural output.

Turn text into lifelike speech with just a few clicks. Voice cloning and TTS are more realistic than ever — perfect for creators, developers, and audio innovators. For best output, use English and limit speech to 40 seconds.

🚀 Experience it yourself — Generate high-quality speech with advanced text-to-speech and voice cloning
👉 Try Voice Cloning and Text to Speech (TTS) now on NextDiffusion.ai

ChatterBox for ComfyUI: Text-to-Speech, Voice Cloning & Conversion

Table of Contents

1. Introduction to ChatterBox in ComfyUI

Transform Text Into Speech Instantly

2. Loading the ChatterBox Workflow in ComfyUI

3. Installing Missing Nodes for ChatterBox

Transform Text Into Speech Instantly

4. Exploring TTS with Voice Cloning Capabilities

How to Use ChatterBox TTS with Voice Cloning

Chatterbox TTS Settings and Explanation

Chatterbox TTS Example (Salma Hayek)

5. Exploring Chatterbox Voice Conversion (VC) with Example

Original Audio (Input_Audio):

Target Voice (Salma Hayek):

Final Output Audio:

Transform Text Into Speech Instantly

6. Limitations of ChatterBox

7. Conclusion: Harnessing the Power of ChatterBox in ComfyUI

Frequently Asked Questions

Explore More Tutorials

FAST Image to Video in ComfyUI (Wan2.2 + LightX2V LoRA)

How to Run Flux Kontext Dev on RunPod

Create Amazing AI Videos

Boost Your AI Performance

ChatterBox for ComfyUI: Text-to-Speech, Voice Cloning & Conversion

Table of Contents

1. Introduction to ChatterBox in ComfyUI

Transform Text Into Speech Instantly

2. Loading the ChatterBox Workflow in ComfyUI

3. Installing Missing Nodes for ChatterBox

Transform Text Into Speech Instantly

4. Exploring TTS with Voice Cloning Capabilities

How to Use ChatterBox TTS with Voice Cloning

Chatterbox TTS Settings and Explanation

Chatterbox TTS Example (Salma Hayek)

5. Exploring Chatterbox Voice Conversion (VC) with Example

Original Audio (Input_Audio):

Target Voice (Salma Hayek):

Final Output Audio:

Transform Text Into Speech Instantly

6. Limitations of ChatterBox

7. Conclusion: Harnessing the Power of ChatterBox in ComfyUI

Frequently Asked Questions

What is ChatterBox in ComfyUI?

How do I install the missing nodes for ChatterBox in ComfyUI?

What are the limitations of using ChatterBox for TTS and VC?

Explore More Tutorials

FAST Image to Video in ComfyUI (Wan2.2 + LightX2V LoRA)

How to Run Flux Kontext Dev on RunPod

Create Amazing AI Videos

Boost Your AI Performance