LoRA Fine-Tuning Qwen3-TTS for Custom Voices
Download printable cheat-sheet (CC-BY 4.0)07 Feb 2026, 00:00 Z
60-second takeaway
Qwen3-TTS + LoRA worked well on this benchmark once we controlled inference scale.
The key lesson was not just checkpoint selection but adapter strength: scale 1.0 over-steered, while 0.3 to 0.35 sounded stable.
For this run, epoch 10 pluslora_scalearound 0.3 was the best operating point.
Companion repo
All reusable LoRA tooling is published separately:
Where this fits
- For founders: this is a strong candidate if you want high quality from single-GPU LoRA runs.
- For engineers: this page captures exact run behavior, including where losses flattened and where inference destabilized.
Series overview:
Experiment setup
- Model: Qwen3-TTS 1.7B Base + LoRA
- Dataset: IMDA NSC
FEMALE_01_44k, JSONL + codec prep pipeline - Split: train/val/test = 90/5/5
- Hardware: RTX 3090 Ti 24 GB
Best checkpoint logic
- Validation improved early and flattened around epochs 8 to 12.
- Validation started rising after epoch 13 in our continued run.
- Best checkpoint by validation trend in this run: epoch 10.
Audio evidence
Recommended sample from this run
<audio controls preload="none"> <source src="/assets/audio/tts/qwen3_tts/epoch10_scale0.35.wav" type="audio/wav" /> </audio>Settings: epoch 10 adapter, scale 0.35.
Failure modes we saw
- Scale 1.0 often sounded noisy/over-steered.
- Some background inference runs failed due to environment/runtime issues, not model quality.
- Attention backend choice affected inference stability in long sweeps.
Recommended inference settings
For this run and hardware profile:
- Use epoch 10 adapter as the first candidate.
- Set
lora_scale