Voice Cloning on a 24GB GPU - What Actually Works in 2026

Download printable cheat-sheet (CC-BY 4.0)

25 Mar 2026, 00:00 Z

60-second takeaway
VoxCPM 1.5, VoxCPM 2, Qwen3-TTS 1.7B, IndexTTS2, and CosyVoice3 all fit within 24GB VRAM on an RTX 3090 Ti for the validated training or inference paths we tested.
The real constraint is not just memory - it is having a working recipe. LoRA paths (VoxCPM 1.5, Qwen3-TTS, CosyVoice3) are the fastest iteration loops. Full SFT paths (IndexTTS2, VoxCPM 2) work but need explicit checkpoint management, memory controls, and validation.
If you have a 24GB GPU and want to start today, VoxCPM 1.5 LoRA is the path of least resistance.

Who this is for

This guide is for engineers who have a single consumer or prosumer GPU (RTX 3090, RTX 3090 Ti, RTX 4090, or similar 24GB class) and want to fine-tune a TTS model for custom voice cloning. We ran all benchmarks on an RTX 3090 Ti (24 GB VRAM) on a Tailscale-connected remote desktop.

The question we're answering: which open-source TTS models can you actually run - training and inference - on a single 24GB GPU in 2026?

The short answer

ModelVRAM fit (24GB)Training modeRecipe maturityDeployable result?
VoxCPM 1.5✅ FitsLoRAMature✅ Yes
VoxCPM 2✅ FitsLoRA or full SFTValidated with memory controls✅ Validation-selected checkpoint
Qwen3-TTS 1.7B

Voice cloning

Need consented AI voiceovers?

Launch AI voice cloning with clear consent, pronunciation tuning, and ad-ready mixes.