McNet-李晓飞实验室

McNet: Fuse Multiple Cues for Multichannel Speech Enhancement

Abstract

Yujie Yang, Changsheng Quan, Xiaofei Li [ PDF ]

In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise. How to fully exploit these two types of information and their temporal dynamics remains an interesting research problem. As a solution to this problem, this paper proposes a multi-cue fusion network named McNet, which cascades four modules to respectively exploit the full-band spatial, narrow-band spatial, sub-band spectral, and full-band spectral information. Experiments show that each module in the proposed network has its unique contribution and, as a whole, notably outperforms other state-of-the-art methods.

Examples

Please open this page with Edge or Chrome, and not use Firefox. The audio playing is problematic in Firefox.

scene	noisy & clean	Oracle MVDR [1]	Narrow-band Net [2]	FT-JNF [3]	McNet (prop.)
BUS	noisy clean	offline	offline online	offline online	offline online
BUS	noisy clean	offline	offline online	offline online	offline online
CAF	noisy clean	offline	offline online	offline online	offline online
CAF	noisy clean	offline	offline online	offline online	offline online
CAF	noisy clean	offline	offline online	offline online	offline online
PED	noisy clean	offline	offline online	offline online	offline online
PED	noisy clean	offline	offline online	offline online	offline online
PED	noisy clean	offline	offline online	offline online	offline online
STR	noisy clean	offline	offline online	offline online	offline online
STR	noisy clean	offline	offline online	offline online	offline online

[1] https://github.com/Enny1991/beamformers
[2] Xiaofei Li and Radu Horaud, “Narrow-band deep filtering for multichannel speech enhancement,” arXiv preprint arXiv:1911.10791, 2019.
[3] Kristina Tesch and Timo Gerkmann, “Insights into deep non-linear filters for improved multi-channel speech enhancement,”arXiv preprint arXiv:2206.13310, 2022.

Source Code

These works are open sourced at github, see [ code ].