CosyVoice 2 vs 3 - Voice Cloning Quality Compared (2026)

Download printable cheat-sheet (CC-BY 4.0)

07 Feb 2026, 00:00 Z

Experiment Status: LoRA rerun completed - best checkpoint at epoch 12. Listening evaluation pending.

60-second takeaway
The first CosyVoice3 run (full SFT) failed after epoch 1 with catastrophic overfitting. A corrected LoRA rerun (2.16M params via PEFT) reached its best checkpoint at epoch 12 with stable training.
CosyVoice2 baseline audio remains the control. The LoRA rerun tools and 9 pitfalls are published at instavar/cosyvoice3-lora-finetuning.

If you searched for CosyVoice 2 quality, CosyVoice 2 voice cloning quality, CosyVoice 2 quality review 2026, or CosyVoice 2 vs CosyVoice 3, this page is the quality-comparison view. It should be read beside the full CosyVoice fine-tuning guide, which covers the data, VRAM, LoRA, and rerun details.

Where this fits

For founders: do not deploy this CosyVoice3 run as-is.
For engineers: use this page as a diagnostic handoff for the next rerun.

Quick read:

CosyVoice2 remains the cleaner control sample from this evidence set.
CosyVoice3 is still attractive for zero-shot quality, but this specific fine-tuned run did not clear production listening review.
The corrected CosyVoice3 LoRA rerun is operationally healthier than the full-SFT run, but quality promotion still depends on listening evaluation.

Series overview:

https://instavar.com/blog/IMDA_NSC_Voice_Cloning_Finetuning_Benchmark_2026

For the full cross-model comparison, see the TTS Model Decision Tree - CosyVoice 3 is recommended for pre-produced content when zero-shot consistency matters most.

Result summary

CosyVoice2 is included as a baseline/control and produced acceptable qualitative output on our selected sample. CosyVoice3 finetuning in this run did not reach production-ready quality, with unstable long-form behavior and weaker linguistic consistency in listening checks.

CosyVoice 2 vs 3 - Voice Cloning Quality Compared (2026)

Where this fits

Result summary

Audio evidence

CosyVoice2 baseline/control

CosyVoice3 representative sample from this run

Need consented AI voiceovers?

What this does and does not mean

Likely contributors in this run

LoRA rerun update

Related deep dives

Related Posts

Where this fits

Result summary

Audio evidence

CosyVoice2 baseline/control

CosyVoice3 representative sample from this run

Need consented AI voiceovers?

What this does and does not mean

Likely contributors in this run

LoRA rerun update

Related deep dives

Related Posts

Open-Source Lip Sync Models Compared in 2026

Supertonic 3 On-Device TTS Reality Check on macOS

Function Calling and MCP First Principles