IndexTTS2 Finetuning on IMDA NSC FEMALE_01
Download printable cheat-sheet (CC-BY 4.0)07 Feb 2026, 00:00 Z
60-second takeaway
IndexTTS2 gave us a usable full-SFT baseline with strong operational predictability once we stabilized restart behavior.
In this run,model_step14000.pthwas the practical checkpoint to keep.
The major challenge was process reliability and checkpoint retention policy, not core output quality.
Where this fits
- For founders: IndexTTS2 is a steady full-finetune option in this benchmark.
- For engineers: this page focuses on run recovery and checkpoint management as much as quality.
Series overview:
Experiment setup
- Model: IndexTTS2
- Dataset: IMDA NSC
FEMALE_01_44kprocessed manifests - Hardware: RTX 3090 Ti 24 GB
- Training mode: full SFT with resume
Best checkpoint logic
- Best validation region was around step ~13800.
- Saved checkpoints available around that region were
model_step14000.pthand latermodel_step15949.pth. - We treated
model_step14000.pthas the practical best anchor for this run.
Audio evidence
Representative sample
<audio controls preload="none"> <source src="/assets/audio/tts/indextts2/indextts2_female01_step14000_long.wav" type="audio/wav" /> </audio>Settings: long-text prompt comparison path, step 14000 checkpoint.
Failure modes we saw
- Training runs were interrupted multiple times and required explicit resume management.
- Some crashes were low-level (
pt_autograd_0segfault signs), which made clean logs critical. - Retention policy kept only recent checkpoint windows, so older steps disappeared automatically.
Recommended inference settings
- Keep checkpoint selection tied to both listening and nearest validation region.
- Prefer explicit run logs and resume metadata over implicit state.
- For this run, compare step 14000 and step 15949 only when you need late-run style differences.
Engineer appendix
Key paths from this run
- Checkpoints:
/mnt/work/chee-wei-jie/voice-models/FEMALE_01_44k/trained_ckpts_female01