• 综合
  • 标题
  • 关键词
  • 摘要
  • 学者
  • 期刊-刊名
  • 期刊-ISSN
  • 会议名称
搜索

作者:

Xiang, Yang (Xiang, Yang.) | Bao, Changchun (Bao, Changchun.) (学者:鲍长春)

收录:

EI Scopus SCIE

摘要:

Recently, deep neural networks (DNNs) have become the mainstream strategy for speech enhancement task because it can achieve the higher speech quality and intelligibility than the traditional methods. However, these DNN-based methods always need a large number of parallel corpus consisting of clean speech and noise to produce noisy data for the training of the DNN in order to improve the generalization of the network. As a result, this implies that many noisy speech signals that are collected in real environment cannot be used to train the DNN because of the lack of corresponding clean speech and noise. Additionally, as we know, noise varies with the time and scenario, so we cannot obtain parallel speech and noise due to infinite noise data and some limited speech data. Thus, the network training with unparallel speech and noise data is essential for the generalization of the network. To address this problem, we propose a novel parallel-data-free speech enhancement method, in which the cycle-consistent generative adversarial network (CycleGAN) and multi-objective learning are employed. Our method is also able to make best use of the benefits of multi-objective learning. On the training stage, we utilize two different encoders to encode the features of clean speech and noisy speech, respectively. Then, two forward generators are immediately used to predict the ideal time-frequency (T-F) mask and log-power spectrum (LPS) of clean speech. Two inverse generators are applied to map the magnitude spectrum (MS) and LPS of noisy speech, respectively. In addition, four discriminators are used to distinguish the real speech features from the generated features. Two encoders, four generators and four discriminators are simultaneously trained by using adversarial, identity-mapping, latent similarity and cycle-consistent loss. On the test stage, we directly utilize the forward generators and encoders to acquire the enhanced speech. The experimental results indicate that the proposed approach is able to achieve the better speech enhancement performance than the reference methods. Moreover, the proposed method is also effective to improve speech quality and intelligibility when the networks are trained under the parallel data.

关键词:

speech enhancement Noise measurement non-parallel data Generative adversarial networks Gallium nitride Cycle-consistent adversarial network Task analysis deep neural networks multi-objective learning Speech enhancement Generators

作者机构:

  • [ 1 ] [Xiang, Yang]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China
  • [ 2 ] [Bao, Changchun]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China

通讯作者信息:

  • 鲍长春

    [Bao, Changchun]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China

查看成果更多字段

相关关键词:

相关文章:

来源 :

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

ISSN: 2329-9290

年份: 2020

卷: 28

页码: 1826-1838

5 . 4 0 0

JCR@2022

ESI学科: ENGINEERING;

ESI高被引阀值:115

被引次数:

WoS核心集被引频次: 32

SCOPUS被引频次: 39

ESI高被引论文在榜: 0 展开所有

万方被引频次:

中文被引频次:

近30日浏览量: 2

归属院系:

在线人数/总访问数:2714/4263845
地址:北京工业大学图书馆(北京市朝阳区平乐园100号 邮编:100124) 联系我们:010-67392185
版权所有:北京工业大学图书馆 站点建设与维护:北京爱琴海乐之技术有限公司