GAN-Based Inter-Channel Amplitude Ratio Decoding in Multi-Channel Speech Coding - Details

Author：

Zhu, Jinru (Zhu, Jinru.) | Bao, Changchun (Bao, Changchun.) (Scholars：鲍长春)

Indexed by：

Abstract：

In　this　paper,　a　multi-channel　speech　coding　method　based　on　down-mixing　and　inter-channel　amplitude　ratio　(ICAR)　decoding　based　on　generative　adversarial　network　(GAN)　is　proposed.　Firstly,　spatial　parameter　inter-channel　time　difference　(ICTD)　is　extracted.　In　the　short-time　Fourier　transform　(STFT)　domain,　the　amplitude　of　the　down-mixed　mono　signal　is　obtained　by　adding　and　averaging　the　amplitude　of　the　multi-channel　speech　signals,　the　phase　of　the　down-mixed　mono　signal　is　replaced　by　the　phase　of　the　reference　channel,　the　STFT　of　the　down-mixed　mono　signal　is　obtained.　Then,　the　inverse　STFT　is　used　to　obtain　the　down-mixed　mono　signal.　The　amplitude　ratio　between　multichannel　speech　signals　and　down-mixed　signal　(ICAR)　is　extracted.　The　down-mixed　mono　signal　is　coded　by　Speex　codec,　and　ICTD　is　quantized　by　a　uniform　scalar　quantizer.　The　ICAR　needn＇t　to　be　encoded.　The　ICAR　is　decoded　from　a　well-trained　GAN　at　the　decoder　based　on　the　decoded　mono　signal.　Finally,　the　decoded　multi-channel　speech　signals　are　recovered　by　using　the　decoded　down-mixed　mono　signal,　decoded　ICTD　and　the　decoded　ICAR.　The　experimental　results　show　that　the　proposed　multi-channel　speech　coding　method　can　recover　multi-channel　speech　signals　with　spatial　information.　©　2021　IEEE.

Keyword：

Signal reconstruction Speech communication Decoding Inverse problems Speech coding

Author Community：

[ 1 ] [Zhu, Jinru]Beijing University of Technology, Faculty of Information Technology, Speech and Audio Signal Processing Laboratory, Beijing, China
[ 2 ] [Bao, Changchun]Beijing University of Technology, Faculty of Information Technology, Speech and Audio Signal Processing Laboratory, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：