IMDA NSC Voice Cloning Finetuning Benchmark 2026

Download printable cheat-sheet (CC-BY 4.0)

07 Feb 2026, 00:00 Z

60-second takeaway
We ran one consistent single-speaker benchmark on IMDA NSC FEMALE_01 with a single-GPU setup.
VoxCPM, IndexTTS2, and Qwen3-TTS all produced usable outputs under specific settings; CosyVoice3 did not reach production-ready quality in this run.
Treat this as an execution benchmark under one configuration, not a universal model ranking.

Who this is for

  • Founder / strategy reader: use the matrix and decision guide to pick what to deploy next.
  • Engineer reader: use each linked deep dive for exact recipes, checkpoints, and failure diagnostics.

Shared experiment setup

  • Dataset: IMDA NSC single-speaker set (FEMALE_01), with model-specific preprocessing.
  • Hardware: single NVIDIA RTX 3090 Ti (24 GB VRAM).
  • Evaluation: qualitative listening on naturalness, accent retention, noise profile, long-text stability, and operational friction (VRAM, disk, rerun complexity).

Comparison matrix

ModelDataset handlingTrain recipeBest checkpoint in this runMain failure modeRecommended inference setting
CosyVoice2 (baseline/control)Baseline sample used as controlNo finetune in this benchmarkBaseline control sample onlyNot evaluated as a finetune target in this seriesUse as control reference only
CosyVoice3IMDA NSC

Voice cloning

Need consented AI voiceovers?

Launch AI voice cloning with clear consent, pronunciation tuning, and ad-ready mixes.