Free & Open Source Voice Cloning with Coqui XTTS v2
Ever wanted to narrate videos with your own voice or have an AI speak like your favorite celebrity? Meet the CV Voice Clone Tool – powered by coqui.ai’s XTTS v2 – which makes this possible with just 5-20 seconds of audio. This open-source solution delivers studio-quality voice synthesis and conversion across 16 languages, all through an intuitive web interface.
💡 Core Features
- Multi-Scenario Voice Cloning
- Text-to-Speech: Generate natural speech in any target voice (Supports EN/ZH/JP/KR/FR/DE/IT + 10 more).
- Voice Conversion: Transform existing audio into new voices while preserving intonation.
- Real-Time Recording: Clone voices instantly via microphone input.
- Optimized Language Support
Language | Quality | Tips |
---|---|---|
English (en) | ⭐⭐⭐⭐⭐ | Works out-of-the-box |
Chinese (zh) | ⭐⭐⭐⭐ | Use short phrases |
Japanese/Korean | ⭐⭐⭐ | Limit samples to 5–15s |
European | ⭐⭐⭐ | Avoid complex liaisons |
⚙️ Installation Guides
Option 1: Prebuilt Version (Beginner-Friendly)
- OS: Windows 10/11
- Steps:
- Download the main program (1.7GB) + voice model (3GB).
- Extract to non-Chinese path (e.g.,
C:/clone-voice
). - Run
app.exe
→ Automatically launches web UI.
Option 2: Source Deployment (Developers)
git clone git@github.com:jianchang512/clone-voice.git
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt --no-deps
# GPU Users (CUDA 12.1):
pip uninstall -y torch
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
Proxy Fix: Add
HTTP_PROXY=http://127.0.0.1:7890
to.env
if downloads fail.
🎯 Pro Tips for Best Results
- For Natural Chinese Output:
emotion='happy' # Options: neutral/happy/sad/angry
speed=1.2 # 0.5-2.0 (1.0=normal)
language="zh" # Force Chinese mode
split_sentences=True # Critical for fluency!
- Sample Requirements:
- Clean audio (no background noise)
- 5-20 second clips
- Clear pronunciation (no mumbling)
⚠️ Ethics & Legal Notice
Under Coqui Public Model License 1.0.0:
- ❌ Commercial use prohibited
- ❌ Unauthorized cloning of real voices Full License
💡 Advanced Use Cases
- Content Creation: Clone your voice for multi-role videos.
- Language Learning: Generate pronunciation guides.
- Audiobooks: Convert texts to celebrity-style narration.
- Game Dev: Create NPC dialogues affordably.
Pro Audio Tip: Record samples at 48kHz (convert to 16kHz) using tools like OBS Studio.
✨ Ready to Experiment? Get started now: GitHub Project