Adversarial imitation learning with mixed demonstrations from multiple demonstrators - 文章详情页

作者：

Zuo, Guoyu (Zuo, Guoyu.) (学者：左国玉) | Zhao, Qishen (Zhao, Qishen.) | Huang, Shuai (Huang, Shuai.) | Li, Jiangeng (Li, Jiangeng.) | Gong, Daoxiong (Gong, Daoxiong.)

收录：

EI Scopus SCIE

摘要：

The　aim　of　generative　adversarial　imitation　learning　(GAIL)　is　to　allow　an　agent　to　learn　an　optimal　policy　from　demonstrations　via　an　adversarial　training　process.　However,　previous　works　have　not　considered　a　realistic　setting　for　complex　continuous　control　tasks　such　as　robot　manipulation,　in　which　the　available　demonstrations　are　imperfect　and　possibly　originate　from　different　policies.　Such　a　setting　poses　significant　challenges　for　the　application　of　the　GAIL-related　methods.　This　paper　proposes　a　novel　imitation　learning　(IL)　algorithm,　MD2-GAIL,　to　enable　an　agent　to　learn　effectively　from　imperfect　demonstrations　by　multiple　demonstrators.　Instead　of　training　the　policy　from　scratch,　unsupervised　pretraining　is　used　to　speed　up　the　adversarial　learning　process.　Confidence　scores　representing　the　quality　of　the　demonstrations　are　utilized　to　reconstruct　the　objective　function　for　off-policy　adversarial　training,　making　the　policy　match　the　optimal　occupancy　measure.　Based　on　the　Soft　Actor　Critic　(SAC)　algorithm,　MD2-GAIL　incorporates　the　idea　of　maximum　entropy　into　the　process　of　optimizing　the　objective　function.　Meanwhile,　a　reshaped　reward　function　is　adopted　to　update　the　agent　policy　to　avoid　falling　into　local　optima.Experiments　were　conducted　based　on　robotic　simulation　tasks,　and　the　results　show　that　our　method　can　efficiently　learn　from　the　available　demonstrations　and　achieves　better　performance　than　other　state-of-the-art　methods.　(c)　2021　Elsevier　B.V.　All　rights　reserved.

关键词：

Adversarial imitation learning Robot learning Multiple demonstrators Imperfect demonstrations

作者机构：

[ 1 ] [Zuo, Guoyu]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 2 ] [Zuo, Guoyu]Beijing Key Lab Comp Intelligence & Intelligent S, Beijing 100124, Peoples R China

通讯作者信息：

左国玉
[Zuo, Guoyu]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China

电子邮件地址：

zuoguoyu@bjut.edu.cn

查看成果更多字段

成果类型
所属机构

所有年份指定年份从至