收录:
摘要:
RoboCup offers a set of challenges for machine learning researchers because it is a dynamic, nondeterministic, goal delayed and continuous state space problem. Reinforcement learning (RL) is often used for strategy learning in RoboCup, which is a method to learn an optimal control policy for sequential decision-making problems. But it is difficult to apply RL to continuous state space problems because of the exponential growth of states in the number of state variables. An effective method is to combine RL with function approximation. However, this combination sometimes leads to diverge. In this paper, we analyze the main reason that cause the non-convergent of the current approximation RL algorithms and propose an optimal strategy learning method. The two processes - value evaluation and policy improvement in RL have been separated. Policy search process is controlled strictly in the direction of improving performance according the evaluation value provided by the value function. And we apply this algorithm to a standard RoboCup sub-problem-Keepaway successfully. Experiment result has verified the effective of the method and showed the algorithm could converge to a local optimal policy. ©2006 IEEE.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2006
卷: 2006
页码: 301-305
语种: 英文