收录:
摘要:
This paper introduces n-step heuristic dynamic programming (NSHDP), which combines regular temporal difference (TD) learning with TD(λ) learning, in order to solve optimal control problems. First, the implementation process of the basic value iteration algorithm is proposed. Then, based on the traditional HDP algorithm, the architecture of the NSHDP(λ) algorithm is described. At the same time, the most important thing is that the stability condition of the NSHDP(λ) algorithm is developed. Furthermore, the one-step critic network, the n-step critic network, and the action network are designed, respectively. Finally, the effectiveness of the proposed algorithm is verified by simulation experiment. © 2022 Technical Committee on Control Theory, Chinese Association of Automation.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
ISSN: 1934-1768
年份: 2022
卷: 2022-July
页码: 2242-2247
语种: 英文
归属院系: