Simulation optimization algorithm for SMDPs with parameterized randomized stationary policies - Details

Author：

Dai, Gui-Ping (Dai, Gui-Ping.) | Tang, Hao (Tang, Hao.) | Xi, Hong-Sheng (Xi, Hong-Sheng.)

Indexed by：

EI Scopus PKU CSCD

Abstract：

Based　on　the　theory　of　performance　potentials　and　the　method　of　equivalent　Markov　process,　the　performance　optimization　problem　is　discussed　for　a　class　of　semi-Markov　decision　processes　(SMDPs)　with　parameterized　randomized　stationary　policies　and　a　simulation　optimization　algorithm　is　proposed.　Firstly,　a　uniform　Markov　chain　is　defined　through　the　equivalent　Markov　process.　Secondly,　the　gradient　of　the　average　cost　performance　with　respect　to　the　policy　parameters　is　then　estimated　by　simulating　a　single　sample　path　of　the　uniformized　Markov　chain,　so　that　an　optimal　(or　suboptimal)　randomized　stationary　policy　can　be　found　by　iterating　the　parameters.　The　derived　algorithm　can　meet　the　requirements　of　performance　optimization　of　many　different　systems　with　large-scale　state　space,　an　artificial　neural　network　is　also　used　to　approximate　the　parameterized　randomized　stationary　policies　and　avoid　the　curse　of　dimensionality.　Finally,　convergence　of　the　algorithm　with　probability　one　on　an　infinite　sample　path　is　considered,　and　a　numerical　example　is　provided　to　illustrate　the　application　of　the　algorithm.

Keyword：

Optimization Neural networks Markov processes Convergence of numerical methods Probability Computer simulation Dynamic programming

Author Community：

[ 1 ] [Dai, Gui-Ping]College of Electronic and Control Engineering, Beijing University of Technology, Beijing 100022, China
[ 2 ] [Dai, Gui-Ping]Department of Automation, University of Science and Technology of China, Hefei 230027, China
[ 3 ] [Tang, Hao]Department of Computer, Hefei University of Technology, Hefei 230009, China
[ 4 ] [Xi, Hong-Sheng]Department of Automation, University of Science and Technology of China, Hefei 230027, China

Reprint Author's Address：

Email：

daigping@bjut.edu.cn

Show more details

Related Keywords：

Comparisons of the performance on transient chaotic neural network with different output functions
2004，WCICA 2004 - Fifth World Congress on Intelligent Control and Automation, Conference Proceedings
A modified difference hopfield neural network and its application
2005，2005 International Conference on Neural Networks and Brain Proceedings, ICNNB'05
Research on reliability evaluation of series systems with optimization algorithm
2005，International Conference on Machine Learning and Cybernetics, ICMLC 2005
Threshold-based adaptive call admission control in hierarchical mobile IPv6
2006，Journal of Software

Source ：

Control Theory and Applications

ISSN： 1000-8152

Year： 2006

Issue： 4

Volume： 23

Page： 547-551

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 8

Affiliated Colleges：

信息学部

Get Fulltext

Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to