Indexed by:
Abstract:
Deep neural network (DNN) has become a popular means for separating target speech from noisy speech due to its good performance for learning a mapping relationship between the training target and noisy speech. For the DNN-based methods, the time-frequency (T-F) mask commonly used as the training target has a significant impact on the performance of speech restoration. However, the T-F mask generally modifies magnitude spectrum of noisy speech and leaves phase spectrum unchanged in enhancing process. The recent studies have revealed that incorporating phase spectrum information into the T-F mask can effectively improve perceptual quality of the enhanced speech. So, in this paper, we present two T-F masks to simultaneously enhance magnitude and phase of speech spectrum based on non-correlation assumption of real part and imaginary part about speech spectrum, and use them as the training target of the DNN model. Experimental results show that, in comparison with the reference methods, the proposed method can obtain an effective improvement in speech quality for different signal to noise ratio (SNR) conditions.
Keyword:
Reprint Author's Address:
Source :
INTERSPEECH 2019
ISSN: 2308-457X
Year: 2019
Page: 3188-3192
Cited Count:
SCOPUS Cited Count: 2
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: