Indexed by:
Abstract:
This paper introduces n-step heuristic dynamic programming (NSHDP), which combines regular temporal difference (TD) learning with TD(λ) learning, in order to solve optimal control problems. First, the implementation process of the basic value iteration algorithm is proposed. Then, based on the traditional HDP algorithm, the architecture of the NSHDP(λ) algorithm is described. At the same time, the most important thing is that the stability condition of the NSHDP(λ) algorithm is developed. Furthermore, the one-step critic network, the n-step critic network, and the action network are designed, respectively. Finally, the effectiveness of the proposed algorithm is verified by simulation experiment. © 2022 Technical Committee on Control Theory, Chinese Association of Automation.
Keyword:
Reprint Author's Address:
Email:
Source :
ISSN: 1934-1768
Year: 2022
Volume: 2022-July
Page: 2242-2247
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 3
Affiliated Colleges: