收录:
摘要:
This paper proposes a robotic imitation learning method which integrates the deterministic off-policy reinforcement learning and generative adversarial network. This method allows the robot to implement the grasping task rapidly by learning the reward function from the demonstration data. Firstly, the discriminator is used to learn the reward function from demonstrations, which can guide the generator to complete the robot grasping task. Secondly, the deep deterministic policy gradient method is used as the generator for learning action policy on the basis of discriminator. In particular, the demonstration data is also input into the generator to ensure its performance. Finally, three experiments on the Push and Pick- and-Place tasks are conducted in the GYM robotic environment. Results show that the learning speed of our method is much faster than the stochastic GAIL method, and it can effectively train from the demonstration data in different states of the task. The proposed method can complete the robot grasping task without environmental reward quickly and improve the stability of the training process. © 2018 IEEE
关键词:
通讯作者信息:
电子邮件地址: