收录:
摘要:
Combined with policy-based reinforcement learning (RL) and value-based RL, the actor-critic (AC) learning structure is an effective framework. However, the cost function of this AC framework has large variances, which make it difficult to accomplish an optimization objective. Based on the discounted generalized value iteration method with 1-regularization, a regularized AC (RAC) framework is developed to address the optimal regulation problems and make the cost function converge faster. Two neural networks are constructed to update the cost function and the policy gradient, respectively. The 1-regularization is used in the policy gradient and the cost function in the process of value iteration. The cost function is proved to converge to the optimal cost function in a monotonically decreasing form. Finally, the effectiveness of RAC is shown through two experiments. © 2023 IEEE.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2023
页码: 105-110
语种: 英文
归属院系: