Rui Zhou,Wenye Zhu, Xiaofei Li [ PDF ]
This work proposes a new learning target based on reverberation time shortening (RTS) for speech dereverberation. The learning target for dereverberation is usually set as the direct-path speech or optionally with some early reflections. This type of target suddenly truncates the reverberation, and thus it may not be suitable for network training. The proposed RTS target suppresses reverberation and meanwhile maintains the exponential decaying property of reverberation, which will ease the network training, and thus reduce signal distortion caused by the prediction error. Moreover, this work experimentally study to adapt our previously proposed FullSubNet speech denoising network to speech dereverberation. Experiments show that RTS is a more suitable learning target than direct-path speech and early reflections, in terms of better suppressing reverberation and signal distortion. FullSubNet is able to achieve outstanding dereverberation performance.
Please open this page with Edge or Chrome, and not use Firefox. The audio playing is problematic in Firefox.
SNR(dB) | RT(s) | unpro. | TCN + SA[1] | SubNet[2] | FullSubNet[3] |
---|---|---|---|---|---|
20 | 0.7 | direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
|
5 | 0.7 | direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
|
20 | 0.7 | direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
|
5 | 0.7 | direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
|
20 | 0.5 | direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
|
5 | 0.5 | direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
direct-path early RTS 0.15 |
[1] Yan Zhao, DeLiang Wang, Buye Xu, and Tao Zhang, “Monau-ral speech dereverberation using temporal convolutional net-works with self attention,” IEEE Transactions on Audio,Speech, and Language Processing, vol. 28, pp. 1598–1607,2020. [2] Xiaofei Li and Horaud Radu, “Online monaural speech en-hancement using delayed subband lstm,” Proc. Interspeech,pp. 2462–2466, 2020. [3] Xiang Hao, Xiangdong Su, Radu Horaud, and Xiaofei Li,“Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement,” in ICASSP 2021 -2021 IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP), 2021, pp. 6633–6637.
These works are open sourced at github, see [ code ].