SpatialNet-李晓飞实验室

SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation

Abstract

This work proposes a neural network to extensively exploit spatial information for multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In the short-time Fourier transform (STFT) domain, the proposed network performs end-to-end speech enhancement. It is mainly composed of interleaved narrow-band and cross-band blocks to respectively exploit narrow-band and cross-band spatial information. The narrow-band blocks process frequencies independently, and use self-attention mechanism and temporal convolutional layers to respectively perform spatial-feature-based speaker clustering and temporal smoothing/filtering. The cross-band blocks processes frames independently, and use full-band linear layer and frequency convolutional layers to respectively learn the correlation between all frequencies and adjacent frequencies. Experiments are conducted on various simulated and real datasets, and the results show that 1) the proposed network achieves the state-of-the-art performance on almost all tasks; 2) the proposed network suffers little from the spectral generalization problem; and 3) the proposed network is indeed performing speaker clustering (demonstrated by attention maps).

Examples of real-recorded data

Dataset	Unproc.	WPE	WPE+BeamformIt	WPD	SpatialNet-small (prop.)	SpatialNet-large (prop.)
Reverb/far
Reverb/far
Reverb/far
Reverb/far
Reverb/far
Reverb/near
Reverb/near
Reverb/near
Reverb/near
Reverb/near
LibriCSS
LibriCSS
LibriCSS
LibriCSS
LibriCSS
LibriCSS
LibriCSS
LibriCSS
LibriCSS
LibriCSS

unproc. spk1 spk2

Source Code

This work is open sourced at github, see [Code]. If you like this work and are willing to cite us, please use:

@article{quan_spatialnet_2023,
    title = {SpatialNet: {Extensively} {Learning} {Spatial} {Information} for {Multichannel} {Joint} {Speech} {Separation}, {Denoising} and {Dereverberation}},
    journal = {arXiv preprint arXiv:2307.16516},
    author = {Quan, Changsheng and Li, Xiaofei},
    year = {2023},
}