您的检索:
学者姓名:鲍长春
精炼检索结果:
年份
成果类型
收录类型
来源
综合
合作者
语言
清除所有精炼条件
摘要 :
This study proposes a three-dimensional room transfer function (RTF) parameterization method based on multiple concentric planar circular arrays, which exhibits robustness to variations in the positions of both the receiver and source. According to the harmonic solution to the wave equation, the RTFs between two spherical regions (sound source and receiver) in a room can be expressed as a weighted sum of spherical harmonics, whose weight coefficients serve as the RTF parameters, which can be estimated by placing multiple concentric planar circular arrays composed of monopole-source pairs (MSPs) and multiple concentric planar circular arrays composed of omnidirectional-microphone pairs (OMPs) in respective source and receiver regions. We use MSP arrays to generate required outgoing soundfields originating from a source region. We derive a method to use OMP arrays to estimate RTF parameters that are concealed within the captured soundfield, which can be employed to reconstruct the RTF from any point in the source region to any point in the receiver region. The accuracy of the RTF parameterization method is validated through simulation testing.
关键词 :
Position measurement Position measurement parameterization parameterization planar arrays planar arrays Kernel Kernel Harmonic analysis Harmonic analysis Room transfer function Room transfer function Loudspeakers Loudspeakers Receivers Receivers Planar arrays Planar arrays Transfer functions Transfer functions
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Li, Lu , Jia, Maoshen , Bao, Changchun . Three-Dimensional Room Transfer Function Parameterization Based on Multiple Concentric Planar Circular Arrays [J]. | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING , 2024 , 32 : 4384-4398 . |
MLA | Li, Lu 等. "Three-Dimensional Room Transfer Function Parameterization Based on Multiple Concentric Planar Circular Arrays" . | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 32 (2024) : 4384-4398 . |
APA | Li, Lu , Jia, Maoshen , Bao, Changchun . Three-Dimensional Room Transfer Function Parameterization Based on Multiple Concentric Planar Circular Arrays . | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING , 2024 , 32 , 4384-4398 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
As a research focus within the field of array signal processing, multi-source direction-of-arrival (DOA) estimation in enclosed environments has been paid much attention. Contaminated by reverberation, noise, and inter-source interference, DOA estimation become challenging. Hence it is essential to identify time-frequency (TF) points dominated by only one source to alleviate these issues. This paper proposes a TF point selection method for DOA estimation based on the first-order relative harmonic coefficient (RHC). This is first analyzed on the "point" level from two perspective, and we design an adaptive single-source dominant zone (SSDZ) detection method. Subsequently, the relationship between first- and zero-order RHC magnitudes of different types of TF points is explored, and we develop a simple but useful rule to further select TF points in the detected SSDZs. Finally, we adopt two-dimensional (2-D) kernel density estimation (KDE) and peak search to estimate the DOAs of sources after calculating the angles of the detected TF points. The effectiveness and robustness of the proposed method are verified and compared with the reference methods through experiments with both the simulated and real-world recordings.
关键词 :
DOA estimation DOA estimation multiple sources multiple sources SSDZ SSDZ first-order RHC first-order RHC
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Tao, Liang , Jia, Maoshen , Bao, Changchun et al. First-Order Relative Harmonic Coefficient-Based Time-Frequency Points Selection for Multi-Source DOA Estimation [J]. | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING , 2024 , 32 : 3200-3212 . |
MLA | Tao, Liang et al. "First-Order Relative Harmonic Coefficient-Based Time-Frequency Points Selection for Multi-Source DOA Estimation" . | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 32 (2024) : 3200-3212 . |
APA | Tao, Liang , Jia, Maoshen , Bao, Changchun , Xiong, Wenmeng . First-Order Relative Harmonic Coefficient-Based Time-Frequency Points Selection for Multi-Source DOA Estimation . | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING , 2024 , 32 , 3200-3212 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Automatic music transcription (AMT) is to transcribe music audio into note symbol representations. Concurrent notes overlapping in the frequency and time domains still hinder the performance of polyphonic piano transcription in current studies. In this work, we develop an attention-based method for piano transcription, where we propose a harmonic-aware attention to capture the musical frequency structure, and a local time attention to model temporal dependencies. The harmonic-aware frequency attention not only emphasizes the relationship between the obvious harmonics, but also extracts the correlation in the residual non-harmonic component. The time attention mechanism is improved using the learnable attention range masks to model frame-wise short-term dependencies on different subtasks. Experiments on the MAESTRO dataset demonstrate that the proposed system achieves state-of-the-art transcription performance on both frame-wise and note-wise F1 metrics. Considering the influence of the piano pedals' dynamic behavior on note duration, a note duration modification method is also proposed. With a more accurate annotation of the offset on MAESTRO, the transcription performance is further improved.
关键词 :
harmonic mask harmonic mask Piano transcription Piano transcription time attention time attention piano pedal piano pedal frequency attention frequency attention
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Wang, Qi , Liu, Mingkuan , Bao, Changchun et al. Harmonic-Aware Frequency and Time Attention for Automatic Piano Transcription [J]. | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING , 2024 , 32 : 3492-3506 . |
MLA | Wang, Qi et al. "Harmonic-Aware Frequency and Time Attention for Automatic Piano Transcription" . | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 32 (2024) : 3492-3506 . |
APA | Wang, Qi , Liu, Mingkuan , Bao, Changchun , Jia, Maoshen . Harmonic-Aware Frequency and Time Attention for Automatic Piano Transcription . | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING , 2024 , 32 , 3492-3506 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
一种基于CTC多层损失的语音识别方法,属于模式识别、声学领域。该方法对语音识别网络不同层的输出进行规范,使不同层的输出尽量接近所需要的语音识别结果,从而提高语音识别的性能。该方法包括模型训练与模型测试两个阶段:在训练阶段,将预处理后的训练集输入所搭建的多层语音识别网络中,计算不同层的损失和不同层的权重,将不同层损失加权求和得到多层损失,循环计算损失,更新网络参数直至收敛;在测试阶段,将预处理后的测试集输入训练好的多层语音识别网络,输出识别结果。本发明仅仅改变CTC语音识别模型训练阶段的损失函数,并不改变CTC语音识别模型的结构及其语音识别的过程,以低复杂度、低开销的特点提高语音识别的准确率。
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | 陈仙红 , 罗德雨 , 鲍长春 . 一种基于CTC多层损失的语音识别方法 : CN202210619908.5[P]. | 2022-06-02 . |
MLA | 陈仙红 et al. "一种基于CTC多层损失的语音识别方法" : CN202210619908.5. | 2022-06-02 . |
APA | 陈仙红 , 罗德雨 , 鲍长春 . 一种基于CTC多层损失的语音识别方法 : CN202210619908.5. | 2022-06-02 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Multiple sound source localization is a hot issue of concern in recent years. The Single Source Zone (SSZ) based localization methods achieve good performance due to the detection and utilization of the Time-Frequency (T-F) zone where only one source is dominant. However, some T-F points consisting of components from multiple sources are also included in the detected SSZ sometimes. Once a T-F point in SSZ is contributed by multiple components, this point is defined as an outlier. The existence of outliers within the detected SSZ is usually an unavoidable problem for SSZ-based methods. To solve this problem, a multi-source localization by using offset residual weight is proposed in this paper. In this method, an assumption is developed: the direction estimated by all the T-F points within the detected SSZ has a difference along with the actual direction of sources. But this difference is much smaller than the difference between the directions estimated by the outliers along with the actual source localization. After verifying this assumption experimentally, Point Offset Residual Weight (PORW) and Source Offset Residual Weight (SORW) are proposed to reduce the influence of outliers on the localization results. Then, a composite weight is formed by combining PORW and SORW, which can effectively distinguish the outliers and desired points. After that, the outliers are removed by composite weight. Finally, a statistical histogram of DOA estimation with outliers removed is used for multi-source localization. The objective evaluation of the proposed method is conducted in various simulated environments. The results show that the proposed method achieves a better performance compared with the reference methods in sources localization.
关键词 :
Multiple sound sources localization Multiple sound sources localization Direction of arrival estimation Direction of arrival estimation Soundfield microphone Soundfield microphone Reverberation Reverberation
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Jia, Maoshen , Gao, Shang , Bao, Changchun . Multi-source localization by using offset residual weight [J]. | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2021 , 2021 (1) . |
MLA | Jia, Maoshen et al. "Multi-source localization by using offset residual weight" . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2021 . 1 (2021) . |
APA | Jia, Maoshen , Gao, Shang , Bao, Changchun . Multi-source localization by using offset residual weight . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2021 , 2021 (1) . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
基于广义合成分析和深度神经网络的自回归系数估计方法
关键词 :
AR系数 AR系数 广义合成分析 广义合成分析 深度神经网络 深度神经网络 莱文逊-杜宾迭代解 莱文逊-杜宾迭代解
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | 崔子豪 , 鲍长春 , 电子学报 . 基于广义合成分析和深度神经网络的自回归系数估计方法 [J]. | 崔子豪 , 2021 , 49 (1) : 29-39 . |
MLA | 崔子豪 et al. "基于广义合成分析和深度神经网络的自回归系数估计方法" . | 崔子豪 49 . 1 (2021) : 29-39 . |
APA | 崔子豪 , 鲍长春 , 电子学报 . 基于广义合成分析和深度神经网络的自回归系数估计方法 . | 崔子豪 , 2021 , 49 (1) , 29-39 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
自回归(AR)模型是一类描述时序序列相关性的有效方法,经典的AR系数估计方法对残差信号做了简单的假设,在噪声干扰等复杂场景中难以准确估计AR系数,而基于深度神经网络(DNN)的AR(DNN-AR)系数估计方法在训练中容易受到莱文逊-杜宾迭代(LDR)解法的数值稳定性的影响.为改善DNN-AR系数训练的稳定性和整体性能,在保证系统稳定性的前提下,本文利用精度转化提高系统运算速度的思路,提出了基于广义合成分析(GABS)模型的深度网络结构改善方法,提高了AR系数在含噪环境下估计的准确性和网络训练的稳定性.组合DNN的GABS(GABS-DNN)的模型由三个主要部分组成:修正器的谱增强网络、编码器的...
关键词 :
深度神经网络 深度神经网络 广义合成分析 广义合成分析 AR系数 AR系数 莱文逊-杜宾迭代解 莱文逊-杜宾迭代解
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | 崔子豪 , 鲍长春 . 基于广义合成分析和深度神经网络的自回归系数估计方法 [J]. | 电子学报 , 2021 , 49 (01) : 29-39 . |
MLA | 崔子豪 et al. "基于广义合成分析和深度神经网络的自回归系数估计方法" . | 电子学报 49 . 01 (2021) : 29-39 . |
APA | 崔子豪 , 鲍长春 . 基于广义合成分析和深度神经网络的自回归系数估计方法 . | 电子学报 , 2021 , 49 (01) , 29-39 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
In this article, the direction of arrival (DOA) estimation of multiple speech sources in reverberant environments is investigated based on the recording of a soundfield microphone. First, the recordings are analyzed in the time-frequency (T-F) domain to detect both "points" (single T-F points) and "regions" (multiple, adjacent T-F points) corresponding to a single source with low reverberation (known as low-reverberant-single-source (LRSS) points). Then, a LRSS point detection algorithm is proposed based on a joint dominance measure and instantaneous single-source point (SSP) identification. Following this, initial DOA estimates obtained for the detected LRSS points are analyzed using a Gaussian Mixture Model (GMM) derived by the Expectation-Maximization (EM) algorithm to cluster components into sources or outliers using a rule-based method. Finally, the DOA of each actual source is obtained from the estimated source components. Experiments on both simulated data and data recorded in an actual acoustic chamber demonstrate that the proposed algorithm exhibits improved performance for the DOA estimation in reverberant environments when compared to several existing approaches.
关键词 :
LRSS point LRSS point Reverberation Reverberation Reflection Reflection reverberant environments reverberant environments Speech processing Speech processing DOA estimation DOA estimation Microphone arrays Microphone arrays Time-frequency analysis Time-frequency analysis Estimation Estimation Direction-of-arrival estimation Direction-of-arrival estimation
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Jia, Maoshen , Wu, Yuxuan , Bao, Changchun et al. Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points [J]. | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING , 2021 , 29 : 379-392 . |
MLA | Jia, Maoshen et al. "Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points" . | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 29 (2021) : 379-392 . |
APA | Jia, Maoshen , Wu, Yuxuan , Bao, Changchun , Ritz, Christian . Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points . | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING , 2021 , 29 , 379-392 . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Beamforming method can effectively remove background noise, even in the complex environment, so it is widely used in speech enhancement. We propose a novel Generalized Eigenvalue (GEV) beamforming with Blind Analytic Normalization (BAN) method. In this method, the GEV beamformer coefficients are constructed by estimating logarithmic power spectrum (LPS), which are used to filter multichannel speech signals, and post filter technology is used to further remove noise in the beamformed signals. Firstly, in order to estimate the LPS of speech signal in each channel, we use the data-driven method to train the deep neural network (DNN) model. Then, we use the well trained DNN model to estimate LPS, which is used to calculate the power spectral density (PSD) matrix of speech, and further obtain the coefficients of the GEV beamformer. Since the GEV beamformer will cause speech distortion, the BAN is employed to post-process the beamformed signal. Furthermore, single channel speech enhancement is used to reduce residual noise. Our experiment is conducted in 8-channel simulation data set. The experimental results show that, compared with some existing speech enhancement methods, the proposed method can effectively remove background noise and achieve better speech enhancement effect.
关键词 :
Blind Analytic Normalization Blind Analytic Normalization Post-filtering Post-filtering Generalized Eigenvalue beamforming Generalized Eigenvalue beamforming Deep Neural Network Deep Neural Network
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Deng, Shuhao , Bao, Changchun , Cheng, Rui . GEV Beamforming with BAN Integrating LPS Estimation and Post-filtering [C] . 2020 . |
MLA | Deng, Shuhao et al. "GEV Beamforming with BAN Integrating LPS Estimation and Post-filtering" . (2020) . |
APA | Deng, Shuhao , Bao, Changchun , Cheng, Rui . GEV Beamforming with BAN Integrating LPS Estimation and Post-filtering . (2020) . |
导入链接 | NoteExpress RIS BibTex |
摘要 :
Deep neural network (DNN) based ideal ratio mask (IRM) estimation methods have yielded good performance in monaural speech enhancement. Meanwhile, these methods have also shown considerable potential for beamforming and multichannel speech enhancement. It is crucial for minimum variance distortionless response (MVDR) beamformer to estimate the covariance matrix of the speech and noise accurately. The accurate estimation of time-frequency (T-F) mask has significant impact on the estimation of the covariance matrices. So, in this paper, a complex real and imaginary ratio mask (CRIRM) based MVDR beamformer for speech enhancement using residual network is proposed. First, the real and imaginary masks of speech and noise are estimated by taking advantage of a residual neural network. After that, the estimations of speech and noise are obtained by using the estimated masks. Finally, the covariance matrices of speech and noise are estimated, and applied into the MVDR beamformer. In addition, in order to further reduce residual noise interference, the output of the MVDR beamformer is further processed by an end-to-end monaural speech enhancement module. Experiments show that, the proposed method can better improve the quality and intelligibility of the enhanced speech.
关键词 :
residual neural network residual neural network postfilter postfilter speech enhancement speech enhancement beamforming beamforming real and imaginary masks real and imaginary masks
引用:
复制并粘贴一种已设定好的引用格式,或利用其中一个链接导入到文献管理软件中。
GB/T 7714 | Wang, Dujuan , Bao, Changchun . Multi-channel Speech Enhancement Based on the MVDR Beamformer and Postfilter [C] . 2020 . |
MLA | Wang, Dujuan et al. "Multi-channel Speech Enhancement Based on the MVDR Beamformer and Postfilter" . (2020) . |
APA | Wang, Dujuan , Bao, Changchun . Multi-channel Speech Enhancement Based on the MVDR Beamformer and Postfilter . (2020) . |
导入链接 | NoteExpress RIS BibTex |
导出
数据: |
选中 到 |
格式: |