收录:
摘要:
Q-learning is a model-free iterative reinforcement learning algorithm that is widely used for navigating mobile robots in unstructured environments. However, the exploration and utilization of the environmental data limits the Q-learning convergence speed for mobile robot navigation. This study used the Q-learning algorithm and the fact that rodents use olfactory cues for spatial orientation and navigation to develop a Q-learning environmental cognitive strategy based on odor-reward shaping. This algorithm reduces useless exploration of the environment by improving the Q-learning action selection strategy. Environmental odor information is integrated into the algorithm with the olfactory factor used to weight the Q-learning and the odor-reward shaping in the action selection strategy. The algorithm effectiveness is evaluated in a simulation of movement in the labyrinth environment used in the Tolman mouse experiment. The results show that the Q-learning algorithm with odor-reward shaping reduces useless exploration of the environment, enhances cognitive learning of the environment, and improves the algorithm convergence speed. © 2021, Tsinghua University Press. All right reserved.
关键词:
通讯作者信息:
电子邮件地址: