Research | Audio Signal and Information Processing Lab

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Abstract

In this work, we propose CleanMel, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance. The proposed network takes as input the noisy and reverberant microphone recording and predicts the corresponding clean Mel-spectrogram. The enhanced Mel-spectrogram can be either transformed to speech waveform with a neural vocoder or directly used for ASR. The proposed network is composed of interleaved cross-band and narrow-band processing in the Mel-frequency domain, for learning the full-band spectral pattern and the narrow-band properties of signals, respectively. Compared to linear-frequency domain or time-domain speech enhancement, the key advantage of Mel-spectrogram enhancement is that Mel-frequency presents speech in a more compact way and thus is easier to learn, which will benefit both speech quality and ASR. Experimental results on four English and one Chinese datasets demonstrate a significant improvement in both speech quality and ASR performance achieved by the proposed model.

Demos

Please open this page with Edge or Chrome. The audio playing is problematic in Firefox.

Mode	Method	CHiME		REVERB		DNS		EARS	RealMAN static
Mode	Method	real	simu.	real	simu.	w.o. reverb.	w. reverb.	EARS	RealMAN static

	noisy
	Clean
online
	onlineFullSubNet

	Demucs(dns64)
	LiSenNet
	oSpatialNet
	CleanMel-S-map (prop.)
	CleanMel-S-mask (prop.)
offline
	ConvTasNet
	offlineFullSubNet
	VoiceFixer
	CDiffuse
	CMGAN
	StoRM
	UNIVERSE++
	SpatialNet
	SpatialNet-Mamba
	CleanMel_S_map (prop.)
	CleanMel_S_mask (prop.)
	CleanMel-L-map (prop.)
	CleanMel-L-mask (prop.)

Citation

@misc{shao2025cleanmel,
    title={CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR}, 
    author={Nian Shao and Rui Zhou and Pengyu Wang and Xian Li and Ying Fang and Yujie Yang and Xiaofei Li},
    year={2025},
    eprint={2502.20040},
    archivePrefix={arXiv},
    primaryClass={eess.AS},
    url={https://arxiv.org/abs/2502.20040}
}