收录:
摘要:
Imitation learning algorithms for robotics applications require sufficient optimal data to learn well-performing strategies. State-of-the-art approaches utilize pre-labeled data or interaction with the environment to filter suboptimal data, which is time-consuming and laborious in reality. In this paper, we propose a new approach that avoids manual labeling or environment interaction. We design an additional discriminator for the behavioral cloning approach to distinguish the optimal and suboptimal data in order to influence policy learning and avoid suboptimal behaviors. Within this framework, we design a new imitation learning algorithm that utilizes the output of the discriminator as weights to learn efficiently on datasets containing suboptimal data. We evaluate the performance of the proposed method in four environments and compare it with three benchmark methods. The results illustrate that our method has better performance when dealing with datasets containing suboptimal data. The method we proposed can distinguish data with higher values in the dataset and enable the agent to learn high-performance policy from imperfect demonstrations or a small amount of data. © 2024 IEEE.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2024
页码: 5566-5571
语种: 英文
归属院系: