Wang Shuai

Associate Professor

Nanjing University

Research Field

Speech signal processing, speech synthesis and conversion, speaker modeling, target speech extraction, large speech models，Applications of audio processing in healthcare (pathological speech recognition and generation, etc.)

shuaiwang@slai.edu.cn

Biography

Dr. Shuai Wang is an Associate Professor, Distinguished Researcher, and Ph.D. Supervisor at Nanjing University, with a joint appointment at Hetao College. He specializes in intelligent audio signal processing across multimodal acoustic signals (speech, audio events, music). He holds a Ph.D. from Shanghai Jiao Tong University and formerly served as Senior Researcher at Tencent Photon Studio. With 40+ publications as first/corresponding author in premier venues (ICASSP, Interspeech) and 10+ patents, Dr. Wang has won championships at VoxSRC 2019 and DIHARD 2019, plus Best Paper awards at ISCSLP 2024. His open-source project WeSpeaker achieves 10M+ monthly downloads on HuggingFace, widely adopted in academia and industry. Research collaborations and exceptional students are welcome in speech processing, multimodal AI, and large models.

Academic Publications

Wang S, Chen Z, Han B, Wang H, Liang C, Zhang B, Xiang X, Ding W, Rohdin J, Silnova A, et al. Advancing Speaker Embedding Learning: Wespeaker Toolkit for Research and Production. Speech Communication, 2024.
Wu W, Chen X, Wang S*, Wang J, Meng L, Wu X, Meng H, Li H. C2AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction. IEEE Journal of Selected Topics in Signal Processing, 2025.
Ma Y, Wang S*, Liu T, Li H. PhiNet: Speaker Verification with Phonetic Interpretability. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025.
Yang C, Wang S*, Chen H, Tan W, Yu J, Li H. SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement. NeurIPS, 2025
Wang W, Pan Z, Li X, Wang S, Li H. Speech Separation with Pretrained Frontend to Minimize Domain Mismatch. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.
Wang S, Yang Y, Wu Z, Qian Y, Yu K. Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020.