
Step 1
Enter Your Text
Paste up to 4000 characters of text — any language, any topic. VoxCPM2 handles punctuation, abbreviations, and numerals automatically.
VoxCPM2 · Zero-Shot cloning
Upload a short audio sample and VoxCPM2 captures the speaker's voice instantly — no training, no fine-tuning, no waiting. You can then speak in 30 languages with that same voice, and if you're just getting started, explore VoxCPM2.
Compare reference clips with cloned output — without leaving this page.
Reference
Video & podcasts · Original voice
Cloned voice
“Keep the host’s voice for intros, ads, and pickups — now generated, not re-recorded.”
Channel host voice · English → cloned English
Reference
Product & app localization · Original voice
Cloned voice (localized)
“Same brand voice, localized script — no new recording session.”
Marketing voice · English → localized output
Reference
Audiobooks & narration · Original voice
Cloned narrator
“Match a narrator’s timbre for sequels and translated editions.”
Narrator voice · Original → cloned

Step 1
Paste up to 4000 characters of text — any language, any topic. VoxCPM2 handles punctuation, abbreviations, and numerals automatically.

Step 2
Choose a reference voice or use a short audio file to create a cloned speaker. VoxCPM2 supports fast zero-shot Voice Cloning.

Step 3
Click Generate Speech. Your audio is ready in seconds. Download as .wav or copy a share link to send to anyone.
Open weights, measurable similarity, and multilingual reach in one stack.
VoxCPM2 is designed for natural zero-shot cloning, preserving speaker identity while producing expressive multilingual speech.
SIM-o (speaker similarity). Source: arXiv 2604.00688, Table 3.
Clone once from English (or any language) and generate Mandarin, Arabic, Spanish, and more — same voice, no per-language re-recording.
Broadest open multilingual TTS coverage in one model.
No fine-tuning queue, no GPU hours, no dataset labeling. The same base model handles TTS, cloning, and Voice Design.
True zero-shot: reference audio only.
Use VoxCPM2 online or self-host the open-source model — released under Apache 2.0.
Commercial use allowed under the license.
A practical snapshot for builders who care about openness, languages, and measured speaker match.
| Feature | VoxCPM2 | ElevenLabs |
|---|---|---|
| Languages supported | 30 | 32 |
| Online access | Free starter tier | Paid plans |
| Open source & self-host | Apache 2.0 | Proprietary |
| Zero-Shot cloning | Yes (3–25s ref) | Yes (paid tiers) |
| Voice design from text | Yes | No |
Product features and pricing may change — verify on each vendor's site before buying.
Where a single reference voice unlocks multilingual output.
Keep the host’s voice for intros, ads, and pickups without booking a new session — ideal for fast-turnaround channels.
Ship the same brand voice across locales: one reference clip, localized scripts in every market language.
Match a narrator’s timbre for pick-ups, sequels, or translated editions while preserving listener familiarity.
Let users hear UI or content in a voice that feels personal — including cross-lingual output from one sample.
Start with transparent credit-based pricing for Text to Speech, Voice Cloning, and Voice Design, then choose the plan that fits your usage.
2 credits (≈ 200 characters / 16 seconds)
(one-time)
800 credits (≈ 80,000 characters / 1.8 hours)
$0.012 per credit
(one-time · most popular)
3,000 credits (≈ 300,000 characters / 4.5 hours)
$0.009 per credit — save 20% vs Basic
(one-time · best value)
6,000 credits (≈ 600,000 characters / 12 hours)
$0.008 per credit — save 50% vs Basic
Credits never expire on any paid plan. Outputs from Basic, Pro, and Business are licensed for commercial use under our Terms.
Choose one-time credits • No subscription or auto-renewal
Everything about zero-shot Voice Cloning with VoxCPM2.
VoxCPM2 AI Voice Cloning is a zero-shot Voice Cloning feature that replicates a speaker's voice from a short audio sample — no training required. Upload a clean reference clip, and VoxCPM2 extracts the speaker's voice profile to generate new speech in that voice across 30 supported languages.
Jump to the free generator on the homepage — no account required.