VoxCPM2 LogoVoxCPM2
Loading

VoxCPM2 · Zero-Shot cloning

VoxCPM2 Voice Cloning:
Zero-Shot Cloning with Style Control

Upload a short audio sample and VoxCPM2 captures the speaker's voice instantly — no training, no fine-tuning, no waiting. You can then speak in 30 languages with that same voice, and if you're just getting started, explore VoxCPM2.

Loading generator...

Hear Voice Cloning in Action

Compare reference clips with cloned output — without leaving this page.

Reference

Video & podcasts · Original voice

Cloned voice

Keep the host’s voice for intros, ads, and pickups — now generated, not re-recorded.

Channel host voice · English → cloned English

Reference

Product & app localization · Original voice

Cloned voice (localized)

Same brand voice, localized script — no new recording session.

Marketing voice · English → localized output

Reference

Audiobooks & narration · Original voice

Cloned narrator

Match a narrator’s timbre for sequels and translated editions.

Narrator voice · Original → cloned

How VoxCPM2 AI Voice Cloning Works

Text box screenshot highlighting input area

Step 1

Enter Your Text

Paste up to 4000 characters of text — any language, any topic. VoxCPM2 handles punctuation, abbreviations, and numerals automatically.

Three-tab mode switch screenshot

Step 2

Choose Your Voice

Choose a reference voice or use a short audio file to create a cloned speaker. VoxCPM2 supports fast zero-shot Voice Cloning.

Result player screenshot highlighting Download button

Step 3

Generate, Play, and Download

Click Generate Speech. Your audio is ready in seconds. Download as .wav or copy a share link to send to anyone.

Why VoxCPM2 Has Powerful AI Voice Cloning

Open weights, measurable similarity, and multilingual reach in one stack.

Closer to the real speaker

VoxCPM2 is designed for natural zero-shot cloning, preserving speaker identity while producing expressive multilingual speech.

SIM-o (speaker similarity). Source: arXiv 2604.00688, Table 3.

30 languages, one profile

Clone once from English (or any language) and generate Mandarin, Arabic, Spanish, and more — same voice, no per-language re-recording.

Broadest open multilingual TTS coverage in one model.

Zero-Shot, zero waiting

No fine-tuning queue, no GPU hours, no dataset labeling. The same base model handles TTS, cloning, and Voice Design.

True zero-shot: reference audio only.

Free online · Apache 2.0

Use VoxCPM2 online or self-host the open-source model — released under Apache 2.0.

Commercial use allowed under the license.

VoxCPM2 vs. ElevenLabs — Voice Cloning Compared

A practical snapshot for builders who care about openness, languages, and measured speaker match.

FeatureVoxCPM2ElevenLabs
Languages supported3032
Online accessFree starter tierPaid plans
Open source & self-hostApache 2.0Proprietary
Zero-Shot cloningYes (3–25s ref)Yes (paid tiers)
Voice design from textYesNo

Product features and pricing may change — verify on each vendor's site before buying.

Who Uses VoxCPM2 AI Voice Cloning

Where a single reference voice unlocks multilingual output.

Video & Podcasts

Keep the host’s voice for intros, ads, and pickups without booking a new session — ideal for fast-turnaround channels.

Product & App Localization

Ship the same brand voice across locales: one reference clip, localized scripts in every market language.

Audiobooks & Narration

Match a narrator’s timbre for pick-ups, sequels, or translated editions while preserving listener familiarity.

Accessibility & Assistive

Let users hear UI or content in a voice that feels personal — including cross-lingual output from one sample.

VoxCPM2 Pricing Plans for
TTS, Voice Cloning, and Voice Design

Start with transparent credit-based pricing for Text to Speech, Voice Cloning, and Voice Design, then choose the plan that fits your usage.

One-time Credits

Free

$0/one-time

2 credits (≈ 200 characters / 16 seconds)

Included in plan
  • 30-Language Multilingual
  • Voice Design
  • Controllable Cloning + Ultimate Cloning
  • 48kHz Studio-Quality Output
  • MP3 / WAV export
  • Commercial license
  • Priority queue

Basic

$9.9/one-time

(one-time)

800 credits (≈ 80,000 characters / 1.8 hours)

$0.012 per credit

Included in plan
  • Everything in Free
  • Commercial license
  • Email support
  • Credits never expire

Pro

✨ Most Popular
$29.9/one-time

(one-time · most popular)

3,000 credits (≈ 300,000 characters / 4.5 hours)

$0.009 per credit — save 20% vs Basic

Included in plan
  • Everything in Basic
  • Early access to new voice models
  • Credits never expire

Business

✨ Best Value
$49.9/one-time

(one-time · best value)

6,000 credits (≈ 600,000 characters / 12 hours)

$0.008 per credit — save 50% vs Basic

Included in plan
  • Everything in Pro
  • Priority generation queue
  • Early access to new voice models
  • Credits never expire

Credits never expire on any paid plan. Outputs from Basic, Pro, and Business are licensed for commercial use under our Terms.

7‑Day Refund
Money-back guarantee
Secure Payment
Powered by Stripe
24/7 Support
Always here to help

Choose one-time credits • No subscription or auto-renewal

✓ One-time credit packs✓ Credits never expire✓ Secure payments✓ Email support

Frequently Asked Questions About AI Voice Cloning

Everything about zero-shot Voice Cloning with VoxCPM2.

VoxCPM2 AI Voice Cloning is a zero-shot Voice Cloning feature that replicates a speaker's voice from a short audio sample — no training required. Upload a clean reference clip, and VoxCPM2 extracts the speaker's voice profile to generate new speech in that voice across 30 supported languages.

Jump to the free generator on the homepage — no account required.