收录:
摘要:
In this paper, a value-iteration-based off-policy Q-learning algorithm is developed. The proposed algorithm solves the optimal regulation problem of nonlinear systems with unknown dynamics. Under the off-policy mechanism, the algorithm utilizes the behavioral policy for full exploration, which is beneficial to avoid the target policy from falling into the local optimal solution. In addition, a relaxation factor is introduced to adjust the convergence rate of the cost function sequence. To implement the algorithm, the critic network and the action network are used to approximate the optimal Q-function and the optimal control policy, respectively. Finally, a simulation example is presented to demonstrate the effectiveness of the proposed algorithm. © 2024 IEEE.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2024
页码: 2717-2722
语种: 英文
归属院系: