收录:
摘要:
This paper presents a theoretical analysis of the value iteration Q-learning with non-discounted costs. The analysis focuses on two main aspects: the convergence of the iterative Q-function and the stability of the system under the final iterative control policy. Unlike previous theoretical results on Q-learning, our analysis takes into account the effect of approximation errors, leading to a more comprehensive investigation. We first discuss the effect of approximation errors on the iterative Q-function update. Then, considering the presence of approximation errors in each iteration, we analyze the convergence of the iterative Q-function. Furthermore, we establish a sufficient condition, also accounting for the approximation errors, to ensure the stability of the system under the final iterative control policy. Finally, two simulation cases are conducted to validate the presented convergence and stability results.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
NEUROCOMPUTING
ISSN: 0925-2312
年份: 2024
卷: 606
6 . 0 0 0
JCR@2022
归属院系: