您的检索:
学者姓名:鲍长春
精炼检索结果:
年份
成果类型
收录类型
来源
综合
合作者
语言
清除所有精炼条件
摘要 :
实时IP语音通信在数据包会丢失的情况下,语音质量会受到严重影响。为了恢复传输过程中丢失的语音信息,本文提出了一种基于瞬时相位差(Instantaneous Phase Deviation, IPD)和深度神经网络(Deep Neural Network, DNN)的丢包隐藏(Packet Loss Concealment, PLC)方法。在训练阶段,将语音的对数功率谱(Log Power Spectrum, LPS)和IPD作为训练DNN的输入特征,以学习从接收包到丢失包的映射关系;在重构阶段,将丢包前接收到的语音包送入训练好的DNN中,恢复出丢失包的语音。实验结果表明,在不同丢包率下,所提方...
关键词 :
相位特征 相位特征 丢包隐藏 丢包隐藏 深度神经网络 深度神经网络
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | 黄晋维 , 鲍长春 . 基于瞬时相位差和深度学习的丢包隐藏方法 [J]. | 信号处理 , 2021 , 37 (10) : 1791-1798 . |
MLA | 黄晋维 等. "基于瞬时相位差和深度学习的丢包隐藏方法" . | 信号处理 37 . 10 (2021) : 1791-1798 . |
APA | 黄晋维 , 鲍长春 . 基于瞬时相位差和深度学习的丢包隐藏方法 . | 信号处理 , 2021 , 37 (10) , 1791-1798 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Multiple sound source localization is a hot issue of concern in recent years. The Single Source Zone (SSZ) based localization methods achieve good performance due to the detection and utilization of the Time-Frequency (T-F) zone where only one source is dominant. However, some T-F points consisting of components from multiple sources are also included in the detected SSZ sometimes. Once a T-F point in SSZ is contributed by multiple components, this point is defined as an outlier. The existence of outliers within the detected SSZ is usually an unavoidable problem for SSZ-based methods. To solve this problem, a multi-source localization by using offset residual weight is proposed in this paper. In this method, an assumption is developed: the direction estimated by all the T-F points within the detected SSZ has a difference along with the actual direction of sources. But this difference is much smaller than the difference between the directions estimated by the outliers along with the actual source localization. After verifying this assumption experimentally, Point Offset Residual Weight (PORW) and Source Offset Residual Weight (SORW) are proposed to reduce the influence of outliers on the localization results. Then, a composite weight is formed by combining PORW and SORW, which can effectively distinguish the outliers and desired points. After that, the outliers are removed by composite weight. Finally, a statistical histogram of DOA estimation with outliers removed is used for multi-source localization. The objective evaluation of the proposed method is conducted in various simulated environments. The results show that the proposed method achieves a better performance compared with the reference methods in sources localization.
关键词 :
Multiple sound sources localization Multiple sound sources localization Direction of arrival estimation Direction of arrival estimation Soundfield microphone Soundfield microphone Reverberation Reverberation
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Jia, Maoshen , Gao, Shang , Bao, Changchun . Multi-source localization by using offset residual weight [J]. | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2021 , 2021 (1) . |
MLA | Jia, Maoshen 等. "Multi-source localization by using offset residual weight" . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2021 . 1 (2021) . |
APA | Jia, Maoshen , Gao, Shang , Bao, Changchun . Multi-source localization by using offset residual weight . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2021 , 2021 (1) . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
The auto-regressive (AR) model is an effective method to describe the correlation of time series.The classic AR coefficient estimation method utilizes a simple assumption about residual signal.It is a challenge to accurately estimate the auto-regressive coefficients in a complex environment such as noise or interference.Even though Deep Neural Networks (DNN)based AR (DNN-AR) coefficient estimation method can estimate the AR coefficients in a complex environment,the DNN-AR method is easily affected by the numerical stability of Levinson-Durbin recursion (LDR) approach during the training stage.The main target is to improve the stability and overall performance of the DNN-AR based method.In this paper,the precision transform method is utilized to improve computational efficiency while keeping system stability,and the generalized analysis-by-synthesis combing DNN (GABS-DNN) model is proposed for improving the accuracy of AR coefficient estimation and stability of the DNN training in the noisy environment.The GABS-DNN model consists of three main parts:spectrum enhancement network in the modifier,DNN preprocessing and LDR parameter estimation at the encoder,and the conversion from autoregressive coefficient to power spectrum at the decoder.In the process of optimizing the objective function,the error between the enhanced spectrum and the observed spectrum is added for reducing the influence of the gradient of the LDR on the enhanced network during back-propagation,which results in a stable estimation of the AR coefficients of noisy speech. © 2021, Chinese Institute of Electronics. All right reserved.
关键词 :
Backpropagation Backpropagation Deep neural networks Deep neural networks Numerical methods Numerical methods System stability System stability Complex networks Complex networks Computational efficiency Computational efficiency
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Cui, Zi-Hao , Bao, Chang-Chun . Auto-Regressive Coefficient Estimation Based on the GABS and DNN [J]. | Acta Electronica Sinica , 2021 , 49 (1) : 29-39 . |
MLA | Cui, Zi-Hao 等. "Auto-Regressive Coefficient Estimation Based on the GABS and DNN" . | Acta Electronica Sinica 49 . 1 (2021) : 29-39 . |
APA | Cui, Zi-Hao , Bao, Chang-Chun . Auto-Regressive Coefficient Estimation Based on the GABS and DNN . | Acta Electronica Sinica , 2021 , 49 (1) , 29-39 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
自回归(AR)模型是一类描述时序序列相关性的有效方法,经典的AR系数估计方法对残差信号做了简单的假设,在噪声干扰等复杂场景中难以准确估计AR系数,而基于深度神经网络(DNN)的AR(DNN-AR)系数估计方法在训练中容易受到莱文逊-杜宾迭代(LDR)解法的数值稳定性的影响.为改善DNN-AR系数训练的稳定性和整体性能,在保证系统稳定性的前提下,本文利用精度转化提高系统运算速度的思路,提出了基于广义合成分析(GABS)模型的深度网络结构改善方法,提高了AR系数在含噪环境下估计的准确性和网络训练的稳定性.组合DNN的GABS(GABS-DNN)的模型由三个主要部分组成:修正器的谱增强网络、编码器的...
关键词 :
深度神经网络 深度神经网络 广义合成分析 广义合成分析 AR系数 AR系数 莱文逊-杜宾迭代解 莱文逊-杜宾迭代解
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | 崔子豪 , 鲍长春 . 基于广义合成分析和深度神经网络的自回归系数估计方法 [J]. | 电子学报 , 2021 , 49 (01) : 29-39 . |
MLA | 崔子豪 等. "基于广义合成分析和深度神经网络的自回归系数估计方法" . | 电子学报 49 . 01 (2021) : 29-39 . |
APA | 崔子豪 , 鲍长春 . 基于广义合成分析和深度神经网络的自回归系数估计方法 . | 电子学报 , 2021 , 49 (01) , 29-39 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
针对基于时频掩蔽的分离方法在多声源场景下的分离效果不佳的问题,论文提出一种利用概率混合模型的理想比率掩蔽多声源分离方法。首先,利用冯·米塞斯分布对时频点处方位角估计进行拟合以及拉普拉斯分布对归一化声压梯度信号向量进行拟合,由此建立概率混合模型。其次,利用期望最大化算法对模型参数进行求解,估计各声源对应的理想比率掩蔽。最后,利用估计出的理想比率掩蔽,从麦克风采集信号中分离得到各声源信号。实验结果表明,与现有基于时频掩蔽的多声源分离方法相比,论文所提方法在欠定场景下具有更好的分离效果。
关键词 :
概率混合模型 概率混合模型 多声源分离 多声源分离 理想比率掩蔽 理想比率掩蔽
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | 贾怡恬 , 杨淇善 , 贾懋珅 et al. 利用概率混合模型的理想比率掩蔽多声源分离方法 [J]. | 信号处理 , 2021 , 37 (10) : 1806-1815 . |
MLA | 贾怡恬 et al. "利用概率混合模型的理想比率掩蔽多声源分离方法" . | 信号处理 37 . 10 (2021) : 1806-1815 . |
APA | 贾怡恬 , 杨淇善 , 贾懋珅 , 许文杰 , 鲍长春 . 利用概率混合模型的理想比率掩蔽多声源分离方法 . | 信号处理 , 2021 , 37 (10) , 1806-1815 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
基于广义合成分析和深度神经网络的自回归系数估计方法
关键词 :
AR系数 AR系数 广义合成分析 广义合成分析 深度神经网络 深度神经网络 莱文逊-杜宾迭代解 莱文逊-杜宾迭代解
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | 崔子豪 , 鲍长春 , 电子学报 . 基于广义合成分析和深度神经网络的自回归系数估计方法 [J]. | 崔子豪 , 2021 , 49 (1) : 29-39 . |
MLA | 崔子豪 et al. "基于广义合成分析和深度神经网络的自回归系数估计方法" . | 崔子豪 49 . 1 (2021) : 29-39 . |
APA | 崔子豪 , 鲍长春 , 电子学报 . 基于广义合成分析和深度神经网络的自回归系数估计方法 . | 崔子豪 , 2021 , 49 (1) , 29-39 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
In this article, the direction of arrival (DOA) estimation of multiple speech sources in reverberant environments is investigated based on the recording of a soundfield microphone. First, the recordings are analyzed in the time-frequency (T-F) domain to detect both 'points' (single T-F points) and 'regions' (multiple, adjacent T-F points) corresponding to a single source with low reverberation (known as low-reverberant-single-source (LRSS) points). Then, a LRSS point detection algorithm is proposed based on a joint dominance measure and instantaneous single-source point (SSP) identification. Following this, initial DOA estimates obtained for the detected LRSS points are analyzed using a Gaussian Mixture Model (GMM) derived by the Expectation-Maximization (EM) algorithm to cluster components into sources or outliers using a rule-based method. Finally, the DOA of each actual source is obtained from the estimated source components. Experiments on both simulated data and data recorded in an actual acoustic chamber demonstrate that the proposed algorithm exhibits improved performance for the DOA estimation in reverberant environments when compared to several existing approaches. © 2014 IEEE.
关键词 :
Audio recordings Audio recordings Clustering algorithms Clustering algorithms Direction of arrival Direction of arrival Frequency domain analysis Frequency domain analysis Frequency estimation Frequency estimation Gaussian distribution Gaussian distribution Image segmentation Image segmentation Maximum principle Maximum principle Reverberation Reverberation
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Jia, Maoshen , Wu, Yuxuan , Bao, Changchun et al. Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points [J]. | ACM Transactions on Audio Speech and Language Processing , 2021 , 29 : 379-392 . |
MLA | Jia, Maoshen et al. "Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points" . | ACM Transactions on Audio Speech and Language Processing 29 (2021) : 379-392 . |
APA | Jia, Maoshen , Wu, Yuxuan , Bao, Changchun , Ritz, Christian . Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points . | ACM Transactions on Audio Speech and Language Processing , 2021 , 29 , 379-392 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
In this letter, a novel weighted mean square error (WMSE) is proposed to improve the DNN-based mask approximation method for speech enhancement, in which the weighting is closely related to the power exponent about noisy spectrum amplitude (NSA) base. The power exponents 0 and 2 separately reflect ideal amplitude masking (IAM) without any clippings and the indirect mapping (IM) on short-time spectral amplitude (STSA), and it is highly related to the enhanced spectrum and the performance of the enhanced signal based on the tests. Also, the experimental results show that the outstanding weighting is the noisy spectrum base with the power exponent 1 for the phase-unaware masking and results in better harmonic structure restoration. The objective function with the WMSE on the NSA (WMSE-NSA) can averagely improve 0.1 on the test of perceptual evaluation of speech quality (PESQ) and 1.7% on the test of short-time objective intelligibility (STOI) compared with the MSE-based mask approximation methods. © 1994-2012 IEEE.
关键词 :
Approximation theory Approximation theory Deep neural networks Deep neural networks Mean square error Mean square error Photomapping Photomapping Speech enhancement Speech enhancement Speech intelligibility Speech intelligibility
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Cui, Zihao , Bao, Changchun . Power Exponent Based Weighting Criterion for DNN-Based Mask Approximation in Speech Enhancement [J]. | IEEE Signal Processing Letters , 2021 , 28 : 618-622 . |
MLA | Cui, Zihao et al. "Power Exponent Based Weighting Criterion for DNN-Based Mask Approximation in Speech Enhancement" . | IEEE Signal Processing Letters 28 (2021) : 618-622 . |
APA | Cui, Zihao , Bao, Changchun . Power Exponent Based Weighting Criterion for DNN-Based Mask Approximation in Speech Enhancement . | IEEE Signal Processing Letters , 2021 , 28 , 618-622 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
In this paper, a multi-channel speech coding method based on down-mixing and inter-channel amplitude ratio (ICAR) decoding based on generative adversarial network (GAN) is proposed. Firstly, spatial parameter inter-channel time difference (ICTD) is extracted. In the short-time Fourier transform (STFT) domain, the amplitude of the down-mixed mono signal is obtained by adding and averaging the amplitude of the multi-channel speech signals, the phase of the down-mixed mono signal is replaced by the phase of the reference channel, the STFT of the down-mixed mono signal is obtained. Then, the inverse STFT is used to obtain the down-mixed mono signal. The amplitude ratio between multichannel speech signals and down-mixed signal (ICAR) is extracted. The down-mixed mono signal is coded by Speex codec, and ICTD is quantized by a uniform scalar quantizer. The ICAR needn't to be encoded. The ICAR is decoded from a well-trained GAN at the decoder based on the decoded mono signal. Finally, the decoded multi-channel speech signals are recovered by using the decoded down-mixed mono signal, decoded ICTD and the decoded ICAR. The experimental results show that the proposed multi-channel speech coding method can recover multi-channel speech signals with spatial information. © 2021 IEEE.
关键词 :
Decoding Decoding Inverse problems Inverse problems Signal reconstruction Signal reconstruction Speech coding Speech coding Speech communication Speech communication
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Zhu, Jinru , Bao, Changchun . GAN-Based Inter-Channel Amplitude Ratio Decoding in Multi-Channel Speech Coding [C] . 2021 . |
MLA | Zhu, Jinru et al. "GAN-Based Inter-Channel Amplitude Ratio Decoding in Multi-Channel Speech Coding" . (2021) . |
APA | Zhu, Jinru , Bao, Changchun . GAN-Based Inter-Channel Amplitude Ratio Decoding in Multi-Channel Speech Coding . (2021) . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Variations of speech content increase the difficulty of speaker verification. In this paper, to alleviate the negative effect of the variations, phoneme-unit-specific time-delay neural network (PUSTDNN) is proposed and applied to the state-of-the-art x-vector system. It models each phoneme unit with an individual time-delay neural network (TDNN). That is to say, each TDNN mainly deals with a phoneme unit. Compared with handling all phoneme units together, when handling a phoneme unit, a TDNN can extract more discriminative speaker information, thus improving the system performance. Two realizations of the PUSTDNN are proposed. The first one can retain speech temporal information. The second one further combines all the TDNNs in a PUSTDNN into a larger TDNN to reduce computational complexity. To avoid model overfitting, the phoneme units are obtained by clustering phonemes based on the phonetic knowledge and phonetic sparsity degree. The PUSTDNN is also compared with two other techniques, i.e., phonetic vector and multitask. Experiments on the Fisher, NIST SRE10, and VoxCeleb datasets show that the phonetic vector technique is most robust to the phoneme unit recognition accuracy. When the accuracy is high enough, the multitask performs better than the phonetic vector, and the PUSTDNN performs best and can achieve over 10% relative improvement compared with the x-vector baseline. © 2014 IEEE.
关键词 :
Linguistics Linguistics Neural networks Neural networks Speech recognition Speech recognition Time delay Time delay Timing circuits Timing circuits Vectors Vectors
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Chen, Xianhong , Bao, Changchun . Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification [J]. | ACM Transactions on Audio Speech and Language Processing , 2021 , 29 : 1243-1255 . |
MLA | Chen, Xianhong et al. "Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification" . | ACM Transactions on Audio Speech and Language Processing 29 (2021) : 1243-1255 . |
APA | Chen, Xianhong , Bao, Changchun . Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification . | ACM Transactions on Audio Speech and Language Processing , 2021 , 29 , 1243-1255 . |
导入链接 | NoteExpress RIS BibTex |
导出
数据: |
选中 到 |
格式: |