Github Page
https://github.com/Audio-WestlakeU
Preprints
-
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction [pdf] [code]
Di Liang, Xiaofei Li
-
Mamba for Streaming ASR Combined with Unimodal Aggregation [pdf] [code]
Ying Fang, Xiaofei Li
-
IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization [pdf] [code]
Yabo Wang, Bing Yang, Xiaofei Li
-
Mel-FullSubNet: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR [pdf] [research page]
Rui Zhou, Xian Li, Ying Fang, Xiaofei Li
-
Narrow-band Deep Filtering for Multichannel Speech Enhancement [pdf] [research page] [code]
Xiaofei Li, Radu Horaud
2024
-
RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization [pdf] [code]
Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li
NeurlPS 2024.
-
Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer [pdf] [code]
Bing Yang, Xiaofei Li
IEEE/ACM Transactions on Audio, Speech, and Language Processing.
-
Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers [pdf] [code]
Changsheng Quan, Xiaofei Li
IEEE Signal Precessing Letters.
-
SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation [pdf] [code]
Changsheng Quan, Xiaofei Li
IEEE/ACM Transactions on Audio, Speech, and Language Processing.
-
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks [pdf] [code]
Xian Li, Nian Shao, Xiaofei Li
IEEE/ACM Transactions on Audio, Speech, and Language Processing.
-
RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function [pdf] [code]
Pengyu Wang, Xiaofei Li
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024.
-
Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors [pdf] [code]
Di Liang, Nian Shao, Xiaofei Li
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024.
-
Fine-tune the pretrained ATST model for sound event detection [pdf] [code]
Nian Shao, Xian Li, Xiaofei Li
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024.
-
Unimodal Aggregation for CTC-based Speech Recognition [pdf] [code]
Ying Fang, Xiaofei Li
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024.
2023
-
McNet: Fuse Multiple Cues for Multichannel Speech Enhancement [pdf] [research page] [code]
Yujie Yang, Changsheng Quan, Xiaofei Li
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.
-
Speech Dereverberation with a Reverberation Time Shortening Target [pdf] [code]
Rui Zhou, Wenye Zhu, Xiaofei Li
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.
-
DVQVC: An Unsupervised Zero-Shot Voice Conversion Framework [pdf] [demo]
Dayong Li, Xian Li, Xiaofei Li
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.
-
FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization [pdf] [code]
Yabo Wang, Bing Yang, Xiaofei Li
Interspeech 2023.
2022
-
Multichannel Speech Separation with Narrow-band Conformer [pdf] [research page] [code]
Changsheng Quan, Xiaofei Li
Interspeech, 2022.
-
ATST: Audio Representation Learning with Teacher-Student Transformer [pdf] [code]
Xian Li, Xiaofei Li
Interspeech, 2022.
-
RCT: Random Consistency Training for Semi-supervised Sound Event Detection [pdf] [code]
Nian Shao, Erfan Loweimi, Xiaofei Li
Interspeech, 2022.
-
Spectro-Temporal SubNet for Real-Time Monaural Speech Denoising and Dereverberation
Feifei Xiong, Weiguang Chen, Pengyu Wang, Xiaofei Li and Jinwei Feng
Interspeech, 2022.
-
Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training [pdf] [research page] [code]
Changsheng Quan, Xiaofei Li
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022.
-
SRP-DNN: Learning Direct-path Phase Difference For Multiple Moving Sound Source Localization [pdf] [code]
Bing Yang, Hong Liu, Xiaofei Li
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022.
-
Connecting the Dots in Self-Supervised Learning: A Brief Survey for Beginners [pdf]
Peng Fei Fang, Xian Li, Yang Yan, Shuai Zhang, Qi Yue Kang, Xiao Fei Li, Zhen Zhong Lan
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 37(3): 507–526, May 2022.
2021
-
Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization [pdf] [code]
Bing Yang, Hong Liu, Xiaofei Li
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, pp. 3491–3503, 2021.
-
Microphone Array Generalization for Multichannel Narrowband Deep Speech Enhancement [pdf] [code]
Siyuan Zhang and Xiaofei Li
Interspeech, 2021.
-
AcousticFusion: Fusing Sound Source Localization to Visual SLAM in dynamic environments [pdf]
Tianwei Zhang, Huayan Zhang, Xiaofei Li, Junfeng Chen, Tin Lun Lam, Sethu Vijayakumar
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.
-
FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement
[pdf]
[research page]
[code]
Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021.
-
Supervised Direct-Path Relative Transfer Function Learning for Binaural Sound Source Localization
[pdf]
Bing Yang, Xiaofei Li, Hong Liu
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021.
-
Enhancing Direct‐path Relative Transfer Function using Deep Neural Network for Robust Sound Source Localization [pdf]
Bing Yang, Runwei Ding, Yutong Ban, Xiaofei Li, Hong Liu
CAAI Transactions on Intelligence Technology, 2021.
2020
-
A Covert Ultrasonic Phone-to-Phone Communication Scheme
Liming Shi, Limin Yu, Kaizhu Huang, Xu Zhu, Zhi Wang, Xiaofei Li, Wenwu Wang, Xinheng Wang
International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 36-48, 2020.
-
Online Monaural Speech Enhancement Using Delayed Subband LSTM [pdf] [research page] [audio
examples]
Xiaofei Li and Radu Horaud
Interspeech 2020.
-
Sub-band Knowledge Distillation Framework for Speech Enhancement [pdf]
Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li
Interspeech 2020.
2019
- Multichannel Online Dereverberation based on Spectral Magnitude Inverse
Filtering [pdf] [audio examples] [matlab code]
Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud
IEEE/ACM Transactions on Audio, Speech and Language Processing, 27 (9), pp. 1365 –
1377, 2019.
- Online Localization and Tracking of Multiple Moving Speakers in
Reverberant Environments [pdf] [research page] [matlab code]
Xiaofei Li, Yutong Ban, Laurent Girin, Xavier Alameda-Pineda, Radu Horaud
IEEE Journal of Selected Topics in Signal Processing, 13 (1), pp. 88 – 103,
2019.
- Multichannel Speech Separation and Enhancement Using the Convolutive
Transfer Function [pdf] [matlab code]
Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud
IEEE/ACM Transactions on Audio, Speech and Language Processing, 27 (3), pp. 645 –
659, 2019.
- Audio-noise Power Spectral Density Estimation Using Long Short-term Memory
[pdf] [test python code and
data]
Xiaofei Li, Simon Leglaive, Laurent Girin, Radu Horaud
IEEE Signal Processing Letters, 26 (6), pp. 918 – 922, 2019.
- Expectation-Maximization for Speech Source Separation using Convolutive
Transfer Function [pdf] [matlab code]
Xiaofei Li, Laurent Girin, Radu Horaud
CAAI Transactions on Intelligent Technologies, 4 (1), pp. 47 – 53, 2019.
- Multiple Sound Source Counting and Localization Based on TF-Wise Spatial
Spectrum Clustering
Bing Yang, Hong Liu, Cheng Pang, Xiaofei Li
IEEE/ACM Transactions on Audio, Speech and Language Processing, 27 (8), pp. 1241 –
1255, 2019.
- Multitask Learning of Time-Frequency CNN for Sound Source Localization [pdf]
Cheng Pang, Hong Liu, Xiaofei Li
IEEE Access, vol.7, pp. 40725 – 40737, 2019.
- Multichannel Speech Enhancement Based on Time-frequency Masking Using
Subband Long Short-Term Memory [pdf] [audio examples][code]
Xiaofei Li and Radu Horaud
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct
2019, New Paltz, NY, United States.
- Audio-Visual Variational Fusion for Multi-Person Tracking with Robots [pdf ]
Xavier Alameda-Pineda, Soraya Arias, Yutong Ban, Guillaume Delorme, Laurent
Girin, Radu Horaud, Xiaofei Li, Bastien Mourgue, Guillaume Sarrazin
ACMMM 2019 – 27th ACM International Conference on Multimedia, Oct 2019, Nice, France.
pp.1059-1061.
2018
- Audio source separation into the wild [pdf]
Laurent Girin, Sharon Gannot, Xiaofei Li
Multimodal Behavior Analysis in the Wild, Academic Press (Elsevier), Computer Vision and
Pattern Recognition, 〈10.1016/B978-0-12-814601-9.00022-5〉, pp. 53-78, 2018.
- Multichannel Identification and Nonnegative Equalization for
Dereverberation and Noise Reduction based on Convolutive Transfer Function [pdf] [audio examples] [matlab code]
Xiaofei Li, Sharon Gannot, Laurent Girin, Radu Horaud
IEEE/ACM Transactions on Audio, Speech and Language Processing, 26 (10), pp. 1755 –
1768, 2018.
- Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
[research page]
Israel D. Gebru, Silèye Ba, Xiaofei Li, Radu Horaud
IEEE Transactions on pattern analysis and machine intelligence, 40 (5), pp. 1086
– 1099, 2018.
- Online Localization of Multiple Moving Speakers in Reverberant
Environments [pdf] [matlab code]
Xiaofei Li, Bastien Mourgue, Laurent Girin, Sharon Gannot and Radu Horaud
IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), July 2018, Sheffield,
UK.
- Multisource MINT Using the Convolutive Transfer Function [pdf] [matlab
code]
Xiaofei Li, Sharon Gannot, Laurent Girin, Radu Horaud
IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Apr 2018,
Calgary, Canada.
- A Cascaded Multiple-Speaker Localization and Tracking System [pdf] [research page] [matlab code]
Xiaofei Li, Yutong Ban, Laurent Girin, Xavier Alameda-Pineda, Radu Horaud
Proceedings of the LOCATA Challenge Workshop – a satellite event of IWAENC 2018, Sep
2018, Tokyo, Japan. pp.1-5.
- Accounting for Room Acoustics in Audio-Visual Multi-Speaker
Tracking
Yutong Ban, Xiaofei Li, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Apr 2018,
Calgary, Canada.
2017
- Multiple-Speaker Localization Based on Direct-Path Features and Likelihood
Maximization with Spatial Sparsity Regularization [pdf] [research
page]
Xiaofei Li, Laurent Girin, Radu Horaud and Sharon Gannot
IEEE/ACM Transactions on Audio, Speech and Language Processing, 25 (10), pp. 1997 –
2012, 2017.
- Binaural Sound Localization Based on Reverberation Weighting and
Generalized Parametric Mapping
Cheng Pang, Hong Liu, Jie Zhang and Xiaofei Li
IEEE/ACM Transactions on Audio, Speech and Language Processing, 25 (8), pp. 1618 –
1632, 2017.
- An EM algorithm for audio source separation based on the convolutive
transfer function [pdf]
[matlab code]
Xiaofei Li, Laurent Girin, Radu Horaud
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct
2017, New Paltz, NY, United States.
- Audio Source Separation based on Convolutive Transfer Function and
Frequency-Domain Lasso Optimization [pdf
] [matlab code]
Xiaofei Li, Laurent Girin, Radu Horaud
IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Mar 2017, New
Orleans, United States.
2016
- Estimation of the Direct-Path Relative Transfer Function for Supervised
Sound-Source Localization [pdf] [matlab code] [research page]
Xiaofei Li, Laurent Girin, Radu Horaud, Sharon Gannot
IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016, 24 (11), pp. 2171
– 2186.
- A Novel Lip Descriptor for Audio-Visual Keyword Spotting Based on Adaptive
Decision Fusion [pdf]
Pingping Wu, Hong Liu, Xiaofei Li, Ting Fan, Xuewu Zhang
IEEE
Transactions on Multimedia 18(3), pp. 326-338,
2016.
- Reverberant Sound Localization with a Robot Head Based on Direct-Path
Relative Transfer Function [pdf]
Xiaofei Li, Laurent Girin, Fabien Badeig, Radu Horaud
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct 2016,
Daejeon, South Korea.
- Voice Activity Detection Based on Statistical Likelihood Ratio With
Adaptive Thresholding [pdf]
Xiaofei Li, Radu Horaud, Laurent Girin, Sharon Gannot
International Workshop on Acoustic Signal Enhancement (IWAENC), Sep 2016, Xi’an,
China.
- Non-Stationary Noise Power Spectral Density Estimation Based on Regional
Statistics [pdf] [matlab code]
Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud
IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Mar 2016,
Shangai, China.
2015
-
A Distributed Architecture for Interacting with NAO [pdf]
Fabien Badeig, Quentin Pelorson, Soraya Arias, Vincent Drouard, Israel Dejene
Gebru, Xiaofei Li, Georgios Evangelidis, Radu Horaud
International Conference on Multimodal Interaction (ICMI), Nov 2015, Seattle, WA, United
States.
-
Local Relative Transfer Function for Sound Source Localization [pdf]
Xiaofei Li, Radu Horaud, Laurent Girin, Sharon Gannot
The European Signal Processing Conference (Eusipco), Aug 2015, Nice, France.
-
Estimation of Relative Transfer Function in the Presence of Stationary
Noise Based on Segmental Power Spectral Density Matrix Subtraction [pdf] [matlab code]
Xiaofei Li, Laurent Girin, Radu Horaud, Sharon Gannot
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr
2015, Brisbane, Australia.
Before 2015
-
Sound Source Localization for HRI Using FOC-based Time Difference Feature
and Spatial Grid Matching [pdf]
Xiaofei Li and Hong Liu
IEEE Transactions on Cybernetics, 43 (4), pp. 1199-1212, 2013.
-
Real-time Sound Source Localization for Mobile
Robot Based on Guided Spectral-Temporal Position Method [pdf]
Xiaofei Li, Miao Shen, Wenmin Wang and Hong Liu
International Journal of Advanced Robotic Systems, 2012, vol.9, 78:2012.
-
A survey of sound source localization for robot audition
Xiaofei Li and Hong Liu
CAAI Transactions on Intelligent Systems, 7 (1), pp. 9-20, 2012. (in Chinese)
-
A Two-Layer Probabilistic Model Based on Time-Delay Compensation for
Binaural Sound Localization [pdf]
Hong Liu, Zhuo Fu and Xiaofei Li
IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6-10,
May, 2013.
-
Time Delay Estimation for Speech Signal Based on FOC-Spectrum [pdf]
Hong Liu and Xiaofei Li
International Conference on INTERSPEECH, Portland, Oregon, USA, 2012:1732-1735.
-
Sound Source Localization for Human-Robot Interaction Based on Spatial
Distribution of Time Difference Feature and Grid Matching [pdf]
Xiaofei Li, Hong Liu and Xuesong Yang
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2011.
-
A Selection Method of Speech Vocabulary for Human-Robot Speech
Interaction [pdf]
Hong Liu and Xiaofei Li
IEEE International Conference on Systems, Man and Cybernetics (SMC), Istanbul, Turkey,
2010:2243-2248.