收录:
摘要:
In this paper, a novel value iteration adaptive dynamic programming (ADP) algorithm is presented, which is called an improved value iteration ADP algorithm, to obtain the optimal policy for discrete stochastic processes. In the improved value iteration ADP algorithm, for the first time we propose a new criteria to verify whether the obtained policy is stable or not for stochastic processes. By analyzing the convergence properties of the proposed algorithm, it is shown that the iterative value functions can converge to the optimum. In addition, our algorithm allows the initial value function to be an arbitrary positive semi-definite function. Finally, two simulation examples are presented to validate the effectiveness of the developed method. (C) 2020 Elsevier Ltd. All rights reserved.
关键词:
通讯作者信息: