Best Open-Source TTS Models for Production in 2026

Download printable cheat-sheet (CC-BY 4.0)

25 Mar 2026, 00:00 Z

60-second takeaway
We ran a consistent single-speaker benchmark on four open-source TTS models using IMDA NSC FEMALE_01 on an RTX 3090 Ti (24GB).
VoxCPM 1.5 and Qwen3-TTS 1.7B both produced deployable outputs. IndexTTS2 gave a stable full-SFT baseline. CosyVoice3 finetuning did not reach production quality in this run (rerun pending).
If you need something deployable today on a 24GB GPU, start with VoxCPM or Qwen3-TTS LoRA.

What this benchmark covers

This is a practitioner-oriented comparison, not an academic leaderboard. We evaluated four models under the same conditions:

Dataset: IMDA NSC FEMALE_01 - a single-speaker set with natural Singaporean English accent
Hardware: one NVIDIA RTX 3090 Ti (24 GB VRAM)
Goal: produce voice-cloned audio suitable for AI-generated video narration (A-roll use case)
Evaluation: qualitative listening on naturalness, long-text stability, accent retention, and operational friction

We are not measuring WER or MOS scores from automated tools. We are measuring whether the output sounds production-ready to a human listener on a video platform.

The four models

VoxCPM 1.5

VoxCPM 1.5 uses a LoRA finetuning path that fits within 24GB VRAM without modification. Training is straightforward with standard train/val splits.

Dimension	Result
Finetuning approach	LoRA
Best checkpoint (this run)	`step_0004000`
Long-text stability	Good
Prompt sensitivity	Moderate - use clean prompt clips
Production-ready?

Dimension	Result
Finetuning approach	Full SFT via LoRA path
Run status	Not production-ready (this run)
Main failure modes	Checkpoint drift, long-text instability, prompt sensitivity
Production-ready?	No - rerun pending

Model	Finetuning	VRAM (24GB)	Best checkpoint	Deployable now?	Effort to run
VoxCPM 1.5	LoRA	Fits	step 4000	✅ Yes	Low
Qwen3-TTS 1.7B	LoRA	Fits	Epoch 10, scale 0.3	✅ Yes	Low–Medium
IndexTTS2	Full SFT	Fits	step 14000	✅ Yes (with care)	Medium
CosyVoice3	Full SFT (LoRA path)	Fits	Rerun pending	❌ Not yet	High

Best Open-Source TTS Models for Production in 2026

What this benchmark covers

The four models

VoxCPM 1.5

Need consented AI voiceovers?

Qwen3-TTS 1.7B

IndexTTS2

CosyVoice3

Head-to-head comparison

Decision guide

If you need deployable output fastest

If you need LoRA-style adapter control

If you need the most reproducible full-SFT baseline

If you want to evaluate CosyVoice

What is IMDA NSC FEMALE_01?

Audio evidence

FAQ

Sources

Related Posts

What this benchmark covers

The four models

VoxCPM 1.5

Need consented AI voiceovers?

Qwen3-TTS 1.7B

IndexTTS2

CosyVoice3

Head-to-head comparison

Decision guide

If you need deployable output fastest

If you need LoRA-style adapter control

If you need the most reproducible full-SFT baseline

If you want to evaluate CosyVoice

What is IMDA NSC FEMALE_01?

Audio evidence

FAQ

Sources

Related Posts

Running OpenAI Privacy Filter on an M2 MacBook Pro - 52-Case Benchmark

How Open-Source TTS Architectures Differ - And What It Means for Fine-Tuning (2026)

Build an AI YouTube Shorts Pipeline - Remotion + TTS + Automated Publishing