Yujie Yang, Changsheng Quan, Xiaofei Li [ PDF ]
In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise. How to fully exploit these two types of information and their temporal dynamics remains an interesting research problem. As a solution to this problem, this paper proposes a multi-cue fusion network named McNet, which cascades four modules to respectively exploit the full-band spatial, narrow-band spatial, sub-band spectral, and full-band spectral information. Experiments show that each module in the proposed network has its unique contribution and, as a whole, notably outperforms other state-of-the-art methods.
Please open this page with Edge or Chrome, and not use Firefox. The audio playing is problematic in Firefox.
scene | noisy & clean | Oracle MVDR [1] | Narrow-band Net [2] | FT-JNF [3] | McNet (prop.) |
---|---|---|---|---|---|
BUS | noisy clean |
offline | offline online |
offline online |
offline online |
BUS | noisy clean |
offline | offline online |
offline online |
offline online |
CAF | noisy clean |
offline | offline online |
offline online |
offline online |
CAF | noisy clean |
offline | offline online |
offline online |
offline online |
CAF | noisy clean |
offline | offline online |
offline online |
offline online |
PED | noisy clean |
offline | offline online |
offline online |
offline online |
PED | noisy clean |
offline | offline online |
offline online |
offline online |
PED | noisy clean |
offline | offline online |
offline online |
offline online |
STR | noisy clean |
offline | offline online |
offline online |
offline online |
STR | noisy clean |
offline | offline online |
offline online |
offline online |
[1] https://github.com/Enny1991/beamformers [2] Xiaofei Li and Radu Horaud, “Narrow-band deep filtering for multichannel speech enhancement,” arXiv preprint arXiv:1911.10791, 2019. [3] Kristina Tesch and Timo Gerkmann, “Insights into deep non-linear filters for improved multi-channel speech enhancement,”arXiv preprint arXiv:2206.13310, 2022.
These works are open sourced at github, see [ code ].