Dose Icon

DOSE+: A Timestep-Aware Dropout Strategy for Diffusion Models in Speech Enhancement

Siqi Yang1, Jin Wu1, Yue Lei1, Wenxin Tai1, Fan Zhou1
1University of Electronic Science and Technology of China, Chengdu, Sichuan, China
Method Image

Abstract

Diffusion-based speech enhancement (SE) models have recently demonstrated superior performance compared to traditional single-step models. In this work, we revisit the advantages of diffusion models from a multi-source learning perspective, highlighting that their ability to jointly leverage data likelihood and conditional mapping makes them theoretically superior to deterministic models when controllability is ensured. From this standpoint, we identify a key limitation in DOSE, a recent diffusion-based SE model that enhances controllability by applying fixed dropout ratio to non-conditional inputs, leading to unnecessary information loss at every timestep. To address this, we propose a timestep-aware dropout mechanism that dynamically adjusts the dropout intensity at each denoising step. Extensive experiments across matched and cross-dataset benchmarks show that our method consistently outperforms DOSE and other state-of-the-art diffusion-based SE methods, achieving superior speech enhancement with high efficiency. The code and audio samples are publicly available at here.

Samples from VBD Dataset

Clean

Clean Waveform

Noisy

Noisy Waveform

DOSE

DOSE Waveform

DOSE+

DOSE+ Waveform

Clean

Clean Waveform

Noisy

Noisy Waveform

DOSE

DOSE Waveform

DOSE+

DOSE+ Waveform

Clean

Clean Waveform

Noisy

Noisy Waveform

DOSE

DOSE Waveform

DOSE+

DOSE+ Waveform

Clean

Clean Waveform

Noisy

Noisy Waveform

DOSE

DOSE Waveform

DOSE+

DOSE+ Waveform

Samples from CHIME-4 Dataset

Clean

Clean Waveform

Noisy

Noisy Waveform

DOSE

DOSE Waveform

DOSE+

DOSE+ Waveform

Clean

Clean Waveform

Noisy

Noisy Waveform

DOSE

DOSE Waveform

DOSE+

DOSE+ Waveform

Clean

Clean Waveform

Noisy

Noisy Waveform

DOSE

DOSE Waveform

DOSE+

DOSE+ Waveform

Clean

Clean Waveform

Noisy

Noisy Waveform

DOSE

DOSE Waveform

DOSE+

DOSE+ Waveform