收录:
摘要:
Everyone knows it is impossible for agents to reach the goal efficiently until it has sufficiently explored the environment or constructed cognitive model of the world, but the essential question is how to generate goal-driven behaviour. Organisms can spontaneously explore the environment with rare or deceptive reward and build map-like representation to support subsequent actions, such as finding food, shelters or mates. What we want to know is whether the robot can imitate such cognitive mechanism to complete navigational tasks? Obviously, relying on high precision sensors as a source to recall the structure of environment is not practical in real world, so we perceive the state space and learn control policy with visual inputs. And to deal with the problems stem from dimension disaster, the deep learning is also used in our method. The navigation systems developed in robotics can typically be divided into two classes: one reach the goal by encoding the structure of environment, it can utilize multiple sensor information as input and provide high-quality environment maps; and the other one is map-less approach, which maintain a control policy in the learning process and use it to finish goal reaching tasks, each of them has their pros and cons. In this paper, we proposed a visual navigation method which can learn goal-driven behavior and encode space structure synchronously. Firstly, in order to learn control policy from raw visual information, we take deep reinforcement learning as basic navigation framework, it provides an end-to-end framework and allow our approach directly predict control signal from high-dimensional sensory inputs. Meanwhile, due to the environment contains a much wider variety of possible training signals, an auxiliary task named collision prediction is added to the model. Then, in the process of exploration, the agent throughout the environment numerous times and observe a lot of states, but much of them are repetitive, the temporal correlation network is used to remove these redundant observation and search for waypoints. Because the various perspective of agent, instead of using hand-designed features, we use temporal distance, which only related to environment steps to compute the similarity between states. And inspired by the researches about cognitive mechanism of animals, we learned that many mammals are able to utilize an observation, especially the one include landmarks, to represent a neighboring state space, thus encoding the environment in a simpler and efficient way. So we use waypoints, which discovered in exploration sequences and can represent an adjacent state space that within a certain temporal distance, to describe the structure of environment gradually. Finally, the space topological map is integrated into the model as a path planning module, and combines with locomotion network to obtain a more general navigation method. The experiment was conducted in 3D simulation environment DMlab. The experiment results show this navigation method can learn goal-driven behavior from visual inputs, and show more efficient learning approach and navigation policy in all test environments, and reduce the amount of data required to build map. Furthermore, by placing the agent in dynamically blocked environment, the model can take advantage of topological map to guide detour behavior and complete navigational tasks, showing better environmental adaptability. © 2021, Science Press. All right reserved.
关键词:
通讯作者信息:
电子邮件地址: