收录:
摘要:
Deep neural network (DNN) has become a popular means for separating target speech from noisy speech due to its good performance for learning a mapping relationship between the training target and noisy speech. For the DNN-based methods, the time-frequency (T-F) mask commonly used as the training target has a significant impact on the performance of speech restoration. However, the T-F mask generally modifies magnitude spectrum of noisy speech and leaves phase spectrum unchanged in enhancing process. The recent studies have revealed that incorporating phase spectrum information into the T-F mask can effectively improve perceptual quality of the enhanced speech. So, in this paper, we present two T-F masks to simultaneously enhance magnitude and phase of speech spectrum based on non-correlation assumption of real part and imaginary part about speech spectrum, and use them as the training target of the DNN model. Experimental results show that, in comparison with the reference methods, the proposed method can obtain an effective improvement in speech quality for different signal to noise ratio (SNR) conditions.
关键词:
通讯作者信息:
来源 :
INTERSPEECH 2019
ISSN: 2308-457X
年份: 2019
页码: 3188-3192
归属院系: