McNet: Fuse Multiple Cues for Multichannel Speech Enhancement


Abstract

Yujie Yang, Changsheng Quan, Xiaofei Li [ PDF ]

In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise. How to fully exploit these two types of information and their temporal dynamics remains an interesting research problem. As a solution to this problem, this paper proposes a multi-cue fusion network named McNet, which cascades four modules to respectively exploit the full-band spatial, narrow-band spatial, sub-band spectral, and full-band spectral information. Experiments show that each module in the proposed network has its unique contribution and, as a whole, notably outperforms other state-of-the-art methods.

Examples

Please open this page with Edge or Chrome, and not use Firefox. The audio playing is problematic in Firefox.

scene noisy & clean Oracle MVDR [1] Narrow-band Net [2] FT-JNF [3] McNet (prop.)
BUS noisy
clean
offline offline
online
offline
online
offline
online
BUS noisy
clean
offline offline
online
offline
online
offline
online
CAF noisy
clean
offline offline
online
offline
online
offline
online
CAF noisy
clean
offline offline
online
offline
online
offline
online
CAF noisy
clean
offline offline
online
offline
online
offline
online
PED noisy
clean
offline offline
online
offline
online
offline
online
PED noisy
clean
offline offline
online
offline
online
offline
online
PED noisy
clean
offline offline
online
offline
online
offline
online
STR noisy
clean
offline offline
online
offline
online
offline
online
STR noisy
clean
offline offline
online
offline
online
offline
online

[1] https://github.com/Enny1991/beamformers
[2] Xiaofei Li and Radu Horaud, “Narrow-band deep filtering for multichannel speech enhancement,” arXiv preprint arXiv:1911.10791, 2019.
[3] Kristina Tesch and Timo Gerkmann, “Insights into deep non-linear filters for improved multi-channel speech enhancement,”arXiv preprint arXiv:2206.13310, 2022.

Source Code

These works are open sourced at github, see [ code ].