Indexed by:
Abstract:
By taking into account temporal correlation of speech feature. In this paper, a novel structure of convolutional Auto Encoder (CAE) was proposed. In this structure, the historical output of the CAE was fed into a CAE stack recurrently. We name this structure as Recurrent Stack Convolutional Auto Encoder (RS-CAE). In the training stage, the training feature maps of the RS-CAE comprise of log power spectrum (LPS) of noisy speech and an additional feature map derived from the LPS of the enhanced speech in the history. In this way, the temporal correlation is incorporated as much as possible in the RS-CAE. The training target is a concatenated vector of auto-regressive (AR) model parameters of speech and noise. At online stage, the LPS of noisy speech and the LPS of the enhanced speech from the history make up input feature maps together. The outputs of the RS-CAE are the AR model parameters of speech and noise, which are used to construct the AR-Wiener filter. Because the estimated AR model parameters are not completely accurate and some harmonics may be lost in the enhanced speech, the codebook-based harmonic recovery technique was proposed to reconstruct harmonic structure of the enhanced speech. The test results confirmed that the proposed method achieved better performance compared with some existing approaches.
Keyword:
Reprint Author's Address:
Source :
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
ISSN: 2329-9290
Year: 2019
Issue: 11
Volume: 27
Page: 1752-1762
5 . 4 0 0
JCR@2022
ESI Discipline: ENGINEERING;
ESI HC Threshold:136
JCR Journal Grade:1
Cited Count:
WoS CC Cited Count: 6
SCOPUS Cited Count: 11
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: