收录:
摘要:
When facing large amounts of data, it is a challenging task to optimize policies by using all data at once. In this paper, a data-driven Q-learning scheme with parallel multi-step deduction is developed to improve learning efficiency using small batch data for discrete-time nonlinear control. Specifically, a data-driven model is established by making use of all data in advance. Then, the proposed algorithm can parallel deduce the small batch data to effectively accelerate the learning process. Furthermore, we can adjust the step size of multi-step deduction to balance the utilization between data and model. The near-optimal policy can be obtained ultimately by using hybrid data from the real system and data-driven model. Finally, a torsional pendulum plant is given to demonstrate the effectiveness of the proposed method. © 2024 IEEE.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2024
页码: 739-744
语种: 英文
归属院系: