Indexed by:
Abstract:
In order to improve the efficiency of multimodal fusion in human-robot interaction (HRI), an improved technique is proposed to synthesize visual and audio data. The robotic auditory system uses a microphone array to obtain auditory information and uses the MUSIC algorithm to determine the azimuth of the sound source, and uses end-to-end gating CNN recognizes speech results; For the visual system, a two-layer neural network system is used to detect and recognize dynamic gestures. An improved D-S evidence theory algorithm based on the rule intention voter is designed to fuse the output results of the two modules for determining intention of the current interactive object. Experimental results validate the efficiency and accuracy of multimodal fusion system. © 2021 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
ISSN: 2689-6621
Year: 2021
Page: 2290-2295
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 7
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: