Indexed by:
Abstract:
Imitation learning algorithms for robotics applications require sufficient optimal data to learn well-performing strategies. State-of-the-art approaches utilize pre-labeled data or interaction with the environment to filter suboptimal data, which is time-consuming and laborious in reality. In this paper, we propose a new approach that avoids manual labeling or environment interaction. We design an additional discriminator for the behavioral cloning approach to distinguish the optimal and suboptimal data in order to influence policy learning and avoid suboptimal behaviors. Within this framework, we design a new imitation learning algorithm that utilizes the output of the discriminator as weights to learn efficiently on datasets containing suboptimal data. We evaluate the performance of the proposed method in four environments and compare it with three benchmark methods. The results illustrate that our method has better performance when dealing with datasets containing suboptimal data. The method we proposed can distinguish data with higher values in the dataset and enable the agent to learn high-performance policy from imperfect demonstrations or a small amount of data. © 2024 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2024
Page: 5566-5571
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 3
Affiliated Colleges: