Learning based visual speech synthesis system - Details

Author：

Jia, Xibin (Jia, Xibin.) (Scholars：贾熹滨) | Yin, Baocai (Yin, Baocai.) (Scholars：尹宝才) | Sun, Yanfen (Sun, Yanfen.) | Lin, Xianpin (Lin, Xianpin.)

Indexed by：

EI Scopus

Abstract：

The　paper　introduces　a　speech-driven　visual　speech　synthesis　system.　The　loosing-coupled-mapping　scheme　is　proposed　to　establish　the　correspondence　between　the　acoustic　speech　class　and　the　visual　speech　class.　Employing　the　data-driven　method　in　the　recorded　video　enables　one　learn　the　mapping　scheme.　To　enhance　the　correlation　between　the　vocal　and　the　visual　speech,　the　articulatory-lip-correlative-speech　mode　is　extracted　by　using　the　genetic　algorithm.　The　results　show　that　the　extracted　feature　can　make　the　corresponding　lip　image　class　have　a　good　clustering　performance.　At　the　synthesis　phase,　the　serial　smooth　lip　images　are　received　by　the　searching　approach　in　accordance　with　the　input　speech.　Compared　with　the　original　video,　the　experiment　shows　that　synthetic　visual　speech　achieves　a　good　result.　Moreover,　further　research　should　be　done　at　the　synthesis　phase　in　order　to　correct　the　jerky　phenomena.

Keyword：

Speech processing Feature extraction Visual communication Human computer interaction Speech synthesis Genetic algorithms Learning systems

Author Community：

[ 1 ] [Jia, Xibin]Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing 100022, China
[ 2 ] [Yin, Baocai]Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing 100022, China
[ 3 ] [Sun, Yanfen]Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing 100022, China
[ 4 ] [Lin, Xianpin]Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing 100022, China