收录:
摘要:
The paper introduces a speech-driven visual speech synthesis system. The loosing-coupled-mapping scheme is proposed to establish the correspondence between the acoustic speech class and the visual speech class. Employing the data-driven method in the recorded video enables one learn the mapping scheme. To enhance the correlation between the vocal and the visual speech, the articulatory-lip-correlative-speech mode is extracted by using the genetic algorithm. The results show that the extracted feature can make the corresponding lip image class have a good clustering performance. At the synthesis phase, the serial smooth lip images are received by the searching approach in accordance with the input speech. Compared with the original video, the experiment shows that synthetic visual speech achieves a good result. Moreover, further research should be done at the synthesis phase in order to correct the jerky phenomena.
关键词:
通讯作者信息:
电子邮件地址: