Voice Cloning on a 24GB GPU - What Actually Works in 2026

Download printable cheat-sheet (CC-BY 4.0)

25 Mar 2026, 00:00 Z

60-second takeaway
VoxCPM 1.5, Qwen3-TTS 1.7B, IndexTTS2, and CosyVoice3 all fit within 24GB VRAM on an RTX 3090 Ti for both training and inference.
The real constraint is not memory - it is having a working recipe. LoRA paths (VoxCPM, Qwen3-TTS) have the most mature recipes. Full SFT paths (IndexTTS2, CosyVoice3) work but need explicit checkpoint management.
If you have a 24GB GPU and want to start today, VoxCPM 1.5 LoRA is the path of least resistance.

Who this is for

This guide is for engineers who have a single consumer or prosumer GPU (RTX 3090, RTX 3090 Ti, RTX 4090, or similar 24GB class) and want to fine-tune a TTS model for custom voice cloning. We ran all benchmarks on an RTX 3090 Ti (24 GB VRAM) on a Tailscale-connected remote desktop.

The question we're answering: which open-source TTS models can you actually run - training and inference - on a single 24GB GPU in 2026?

The short answer

Model	VRAM fit (24GB)	Training mode	Recipe maturity	Deployable result?
VoxCPM 1.5	✅ Fits	LoRA	Mature	✅ Yes
Qwen3-TTS 1.7B	✅ Fits	LoRA	Mature	✅ Yes
IndexTTS2	✅ Fits	Full SFT

Voice Cloning on a 24GB GPU - What Actually Works in 2026

Who this is for

The short answer

Need consented AI voiceovers?

VoxCPM 1.5 on 24GB

Qwen3-TTS 1.7B on 24GB

IndexTTS2 on 24GB

CosyVoice3 on 24GB

Hardware notes: RTX 3090 Ti specifics

Practical setup checklist

FAQ

Sources and related posts

Related Posts

Who this is for

The short answer

Need consented AI voiceovers?

VoxCPM 1.5 on 24GB

Qwen3-TTS 1.7B on 24GB

IndexTTS2 on 24GB

CosyVoice3 on 24GB

Hardware notes: RTX 3090 Ti specifics

Practical setup checklist

FAQ

Sources and related posts

Related Posts

YouTube Shorts for AI-Generated Content - Rules, Monetization, and What Gets Flagged

How to Run an AI Video Model Bakeoff Without Turning It Into Vibes

What a Production-Grade AI Video Pipeline Actually Needs (2026)