收录:
摘要:
Facing the requirement of the virtual pedagogy application to have the ability of evaluating English learners' pronunciation quality, the paper proposes an automatic assessment method based on a bimodal fusion decision algorithm. The pronunciation level is scored by comparing the similarity between learner and standard's audio and video speech signals separately. The final score of the learner's pronunciation is gotten by fusing the above scores with the linear weighting combination approach. Referring to the knowledge that the visual speech can aid the audio to improve the human perception especially under noisy environments, the paper proposes a noise adaptive weighting strategy in fusing process. To solve the problem of disagreement of speech length due to the various speaking speed, the paper adopts the dynamic warping algorithm to do the time alignment between the test speeches and the standard ones. The data selected from the Australia audio and visual speech corpus (AVOZES) is employed to test the performance of our automatic evaluating system. The experiment result shows that audio and visual speech fusion approach improves the rationality of automatic pronunciation accessing system by making full use of correlative and complementary information between acoustic and visual speech comparing to the audio-speech-only evaluation results. © 2012 AICIT.
关键词:
通讯作者信息:
电子邮件地址: