Voice Cloning on a 24GB GPU - What Actually Works in 2026

Download printable cheat-sheet (CC-BY 4.0)

25 Mar 2026, 00:00 Z

60-second takeaway
VoxCPM 1.5, Qwen3-TTS 1.7B, IndexTTS2, and CosyVoice3 all fit within 24GB VRAM on an RTX 3090 Ti for both training and inference.
The real constraint is not memory - it is having a working recipe. LoRA paths (VoxCPM, Qwen3-TTS) have the most mature recipes. Full SFT paths (IndexTTS2, CosyVoice3) work but need explicit checkpoint management.
If you have a 24GB GPU and want to start today, VoxCPM 1.5 LoRA is the path of least resistance.

Who this is for

This guide is for engineers who have a single consumer or prosumer GPU (RTX 3090, RTX 3090 Ti, RTX 4090, or similar 24GB class) and want to fine-tune a TTS model for custom voice cloning. We ran all benchmarks on an RTX 3090 Ti (24 GB VRAM) on a Tailscale-connected remote desktop.

The question we're answering: which open-source TTS models can you actually run - training and inference - on a single 24GB GPU in 2026?

The short answer

ModelVRAM fit (24GB)Training modeRecipe maturityDeployable result?
VoxCPM 1.5✅ FitsLoRAMature✅ Yes
Qwen3-TTS 1.7B✅ FitsLoRAMature✅ Yes
IndexTTS2✅ FitsFull SFT

Voice cloning

Need consented AI voiceovers?

Launch AI voice cloning with clear consent, pronunciation tuning, and ad-ready mixes.