FP8 on RTX 3090 Ti - What Actually Works on Consumer GPUs

Download printable cheat-sheet (CC-BY 4.0)

10 May 2026, 00:00 Z

60-second takeaway
FP8 on an RTX 3090 Ti is real, but it is mostly a VRAM-saving storage trick, not an FP8 acceleration path.
The 3090 Ti is an Ampere GPU with compute capability 8.6. It can hold weights in torch.float8_e4m3fn, but native FP8 tensor-core compute is not the path you should count on. The reliable consumer-GPU recipe is to store large diffusion transformer weights in FP8, then compute in BF16.
If a model almost fits in 24 GB, use diffusers layerwise casting first. If it still does not fit, combine FP8 storage on the diffusion transformer with NF4 on the text encoder.

Who this is for

This guide is for builders running image or video generation models on a single 24 GB NVIDIA GPU:

  • RTX 3090
  • RTX 3090 Ti
  • A10
  • RTX 4090
  • L40 / L40S

The most common reader has a model that almost fits. A README, issue comment, or benchmark says "use FP8", but the same recipe was probably written on a 4090, L40, H100, or newer card. On a 3090 Ti, that detail matters.

The question is not "does PyTorch expose FP8 dtypes?" It does. The question is:

Which FP8 paths actually help on Ampere, and which ones quietly assume newer hardware?

The short answer

On RTX 3090 Ti:

pipe.transformer.enable_layerwise_casting(
    storage_dtype=torch.float8_e4m3fn,
    compute_dtype=torch.bfloat16,
)

This is the practical path. It stores transformer weights in FP8, then casts them back to BF16 when each layer runs.

Do not expect this to make generation faster. Expect it to cut weight memory enough that a larger model or higher resolution run fits.

What SM 8.6 vs SM 8.9 means

SM means Streaming Multiprocessor architecture version. In CUDA docs, this is usually described as compute capability.

The useful boundary for this post:

GPU familyExample GPUsCompute capabilityPractical FP8 meaning
Ampere consumer / prosumerRTX 3090, RTX 3090 Ti, A108.6

AI video production

Turn AI video into a repeatable engine

Build an AI-assisted video pipeline with hook-first scripts, brand-safe edits, and multi-platform delivery.