Diffusion Speech Denoising in 2025 -- StoRM, SGMSE+, UNIVERSE++, Schrodinger Bridges, and Streaming Variants

Download printable cheat-sheet (CC-BY 4.0)

24 Apr 2025, 00:00 Z

TL;DR Speech denoising now spans a family of diffusion-flavoured designs. StoRM blends a predictive estimate with a diffusion sampler to tame hallucinations at low cost; SGMSE/SGMSE+ continue to scale score matching with variance-aware schedulers; UNIVERSE++ bakes in adversarial loss and low-rank adaptation for cross-condition robustness; few-step Schrodinger-Bridge variants target sub-10 step inference; causal diffusion architectures chase streaming deployment; and MossFormer2 remains a strong baseline when you can tolerate separation-first latency.

Why teams care in 2025

  • Customer support, call centers, and meeting tooling now demand universal denoisers that handle noise, reverberation, codec artifacts, and far-field setups without per-domain tuning.
  • Real-time AI voice agents (telephony, kiosks, wearables) force inference budgets down to single-digit diffusion steps -- or hybrids that drop to predictive streams when necessary.
  • Evaluation shifted from "cleaner spectrograms" to intelligibility (STOI/SI-SDR), MOS (subjective or DNSMOS), and downstream ASR WER. Modern stacks must show gains across all.

The models, at a glance

ModelKey ideaStep countNotable metricsWhere it shines
StoRM (Lemercier et al., IEEE/ACM TASLP 2023)Predictive network provides a guided starting point for the diffusion sampler, suppressing breathing/phonation artifacts.8-30 (configurable)VoiceBank+DEMAND PESQ >= 2.9 with 8 steps; DNSMOS better than pure score-matching at same budget.Low-latency deployments that still need diffusion-grade quality.
SGMSE / SGMSE+ (Richter et al., 2022; Lay & Gerkmann, 2024)Score-based generative speech enhancement with SDEs; SGMSE+ adds a stronger UNet, variance-aware schedules, and dereverb handling.

Voice cloning

Need consented AI voiceovers?

Launch AI voice cloning with clear consent, pronunciation tuning, and ad-ready mixes.