IndexTTS2 Finetuning on IMDA NSC FEMALE_01

Download printable cheat-sheet (CC-BY 4.0)

07 Feb 2026, 00:00 Z

60-second takeaway
IndexTTS2 gave us a usable full-SFT baseline with strong operational predictability once we stabilized restart behavior.
In this run, model_step14000.pth was the practical checkpoint to keep.
The major challenge was process reliability and checkpoint retention policy, not core output quality.

Where this fits

For founders: IndexTTS2 is a steady full-finetune option in this benchmark.
For engineers: this page focuses on run recovery and checkpoint management as much as quality.

Series overview:

https://instavar.com/blog/IMDA_NSC_Voice_Cloning_Finetuning_Benchmark_2026

The fine-tuning pipeline used in this benchmark is open-source: instavar/indextts2-finetuning - the first public fine-tuning code for IndexTTS2 (the official repo is inference-only).

Experiment setup

Model: IndexTTS2
Dataset: IMDA NSC FEMALE_01_44k processed manifests
Hardware: RTX 3090 Ti 24 GB
Training mode: full SFT with resume

Best checkpoint logic

Best validation region was around step ~13800.
Saved checkpoints available around that region were model_step14000.pth and later model_step15949.pth.
We treated model_step14000.pth as the practical best anchor for this run.

Audio evidence

Representative sample

Settings: long-text prompt comparison path, step 14000 checkpoint.

Failure modes we saw

Training runs were interrupted multiple times and required explicit resume management.
Some crashes were low-level (pt_autograd_0 segfault signs), which made clean logs critical.
Retention policy kept only recent checkpoint windows, so older steps disappeared automatically.

Recommended inference settings

Keep checkpoint selection tied to both listening and nearest validation region.
Prefer explicit run logs and resume metadata over implicit state.

IndexTTS2 Finetuning on IMDA NSC FEMALE_01

Where this fits

Experiment setup

Best checkpoint logic

Audio evidence

Representative sample

Failure modes we saw

Recommended inference settings

Need consented AI voiceovers?

Engineer appendix

Key paths from this run

Operational note

Related deep dives

Related Posts

Where this fits

Experiment setup

Best checkpoint logic

Audio evidence

Representative sample

Failure modes we saw

Recommended inference settings

Need consented AI voiceovers?

Engineer appendix

Key paths from this run

Operational note

Related deep dives

Related Posts

Open-Source Lip Sync Models Compared in 2026

Supertonic 3 On-Device TTS Reality Check on macOS

Function Calling and MCP First Principles