Query:
学者姓名:王鼎
Refining:
Year
Type
Indexed by
Source
Complex
Co-Author
Language
Clean All
Abstract :
This article develops a novel data-driven safe Q-learning method to design the safe optimal controller which can guarantee constrained states of nonlinear systems always stay in the safe region while providing an optimal performance. First, we design an augmented utility function consisting of an adjustable positive definite control obstacle function and a quadratic form of the next state to ensure the safety and optimality. Second, by exploiting a pre-designed admissible policy for initialization, an off-policy stabilizing value iteration Q-learning (SVIQL) algorithm is presented to seek the safe optimal policy by using offline data within the safe region rather than the mathematical model. Third, the monotonicity, safety, and optimality of the SVIQL algorithm are theoretically proven. To obtain the initial admissible policy for SVIQL, an offline VIQL algorithm with zero initialization is constructed and a new admissibility criterion is established for immature iterative policies. Moreover, the critic and action networks with precise approximation ability are established to promote the operation of VIQL and SVIQL algorithms. Finally, three simulation experiments are conducted to demonstrate the virtue and superiority of the developed safe Q-learning method.
Keyword :
Adaptive critic control Adaptive critic control Optimal control Optimal control Safety Safety Mathematical models Mathematical models stabilizing value iteration Q-learning (SVIQL) stabilizing value iteration Q-learning (SVIQL) Heuristic algorithms Heuristic algorithms Learning systems Learning systems adaptive dynamic programming (ADP) adaptive dynamic programming (ADP) control barrier functions (CBF) control barrier functions (CBF) state constraints state constraints Q-learning Q-learning Iterative methods Iterative methods
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Zhao, Mingming , Wang, Ding , Song, Shijie et al. Safe Q-Learning for Data-Driven Nonlinear Optimal Control with Asymmetric State Constraints [J]. | IEEE-CAA JOURNAL OF AUTOMATICA SINICA , 2024 , 11 (12) : 2408-2422 . |
MLA | Zhao, Mingming et al. "Safe Q-Learning for Data-Driven Nonlinear Optimal Control with Asymmetric State Constraints" . | IEEE-CAA JOURNAL OF AUTOMATICA SINICA 11 . 12 (2024) : 2408-2422 . |
APA | Zhao, Mingming , Wang, Ding , Song, Shijie , Qiao, Junfei . Safe Q-Learning for Data-Driven Nonlinear Optimal Control with Asymmetric State Constraints . | IEEE-CAA JOURNAL OF AUTOMATICA SINICA , 2024 , 11 (12) , 2408-2422 . |
Export to | NoteExpress RIS BibTex |
Abstract :
In this paper, an adjustable Q -learning scheme is developed to solve the discrete -time nonlinear zero -sum game problem, which can accelerate the convergence rate of the iterative Q -function sequence. First, the monotonicity and convergence of the iterative Q -function sequence are analyzed under some conditions. Moreover, by employing neural networks, the model -free tracking control problem can be overcome for zerosum games. Second, two practical algorithms are designed to guarantee the convergence with accelerated learning. In one algorithm, an adjustable acceleration phase is added to the iteration process of Q -learning, which can be adaptively terminated with convergence guarantee. In another algorithm, a novel acceleration function is developed, which can adjust the relaxation factor to ensure the convergence. Finally, through a simulation example with the practical physical background, the fantastic performance of the developed algorithm is demonstrated with neural networks.
Keyword :
Adaptive dynamic programming Adaptive dynamic programming Optimal tracking control Optimal tracking control Neural networks Neural networks Q-learning Q-learning Zero-sum games Zero-sum games Convergence rate Convergence rate
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Yuan , Wang, Ding , Zhao, Mingming et al. Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate [J]. | NEURAL NETWORKS , 2024 , 175 . |
MLA | Wang, Yuan et al. "Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate" . | NEURAL NETWORKS 175 (2024) . |
APA | Wang, Yuan , Wang, Ding , Zhao, Mingming , Liu, Nan , Qiao, Junfei . Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate . | NEURAL NETWORKS , 2024 , 175 . |
Export to | NoteExpress RIS BibTex |
Abstract :
In this article, an evolution-guided value iteration (EGVI) algorithm is established to address optimal tracking problems for nonlinear nonaffine systems. Conventional adaptive dynamic programming algorithms rely on gradient information to improve the policy, which adheres to the first order necessity condition. Nonetheless, these methods encounter limitations when gradient information is intricate or system dynamics lack differentiability. In response to this challenge, evolutionary computation is leveraged by EGVI to search for the optimal policy without requiring an action network. The competition within the policy population serves as the driving force for policy improvement. Therefore, EGVI can effectively handle complex and non-differentiable systems. Additionally, this innovative method has the potential to enhance exploration efficiency and bolster the robustness of algorithms due to its population-based characteristics. Furthermore, the convergence of the algorithm and the stability of the policy are investigated based on the EGVI framework. Finally, the effectiveness of the established method is comprehensively demonstrated through two simulation experiments.
Keyword :
Adaptive dynamic programming Adaptive dynamic programming Intelligent control Intelligent control Optimal tracking Optimal tracking Reinforcement learning Reinforcement learning Adaptive critic designs Adaptive critic designs Evolutionary computation Evolutionary computation
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Huang, Haiming , Wang, Ding , Zhao, Mingming et al. Evolution-guided value iteration for optimal tracking control [J]. | NEUROCOMPUTING , 2024 , 593 . |
MLA | Huang, Haiming et al. "Evolution-guided value iteration for optimal tracking control" . | NEUROCOMPUTING 593 (2024) . |
APA | Huang, Haiming , Wang, Ding , Zhao, Mingming , Hu, Qinna . Evolution-guided value iteration for optimal tracking control . | NEUROCOMPUTING , 2024 , 593 . |
Export to | NoteExpress RIS BibTex |
Abstract :
The wastewater treatment process (WWTP) is beneficial for maintaining sufficient water resources and recycling wastewater. A crucial link of WWTP is to ensure that the dissolved oxygen (DO) concentration is continuously maintained at the predetermined value, which can actually be considered as a tracking problem. In this article, an experience replay-based action-dependent heuristic dynamic programming (ER-ADHDP) method is developed to design the model-free tracking controller to accomplish the tracking goal of the DO concentration. First, the online ER-ADHDP controller is regarded as a supplementary controller to conduct the model-free tracking control alongside a stabilizing controller with a priori knowledge. The online ER-ADHDP method can adaptively adjust weight parameters of critic and action networks, thereby continuously ameliorating the tracking result over time. Second, the ER technique is integrated into the critic and action networks to promote the data utilization efficiency and accelerate the learning process. Third, a rational stability result is provided to theoretically ensure the usefulness of the ER-ADHDP tracking design. Finally, simulation experiments including different reference trajectories are conducted to show the superb tracking performance and excellent adaptability of the proposed ER-ADHDP method.
Keyword :
wastewater treatment applications wastewater treatment applications tracking control tracking control Action-dependent heuristic dynamic programming (ADHDP) Action-dependent heuristic dynamic programming (ADHDP) adaptive dynamic programming (ADP) adaptive dynamic programming (ADP) adaptive critic control adaptive critic control
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Qiao, Junfei , Zhao, Mingming , Wang, Ding et al. Action-Dependent Heuristic Dynamic Programming With Experience Replay for Wastewater Treatment Processes [J]. | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS , 2024 , 20 (4) : 6257-6265 . |
MLA | Qiao, Junfei et al. "Action-Dependent Heuristic Dynamic Programming With Experience Replay for Wastewater Treatment Processes" . | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 20 . 4 (2024) : 6257-6265 . |
APA | Qiao, Junfei , Zhao, Mingming , Wang, Ding , Li, Menghua . Action-Dependent Heuristic Dynamic Programming With Experience Replay for Wastewater Treatment Processes . | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS , 2024 , 20 (4) , 6257-6265 . |
Export to | NoteExpress RIS BibTex |
Abstract :
The core of the optimal tracking control problem for nonlinear systems is how to ensure that the controlled system tracks the desired trajectory. The utility functions in previous studies have different properties which affect the final tracking effect of the intelligent critic algorithm. In this paper, we introduce a novel utility function and propose a Q -function based policy iteration algorithm to eliminate the final tracking error. In addition, neural networks are used as function approximator to approximate the performance index and control policy. Considering the impact of the approximation error on the tracking performance, an approximation error bound for each iteration of the novel Q -function is established. Under the given conditions, the approximation Q -function converges to the finite neighborhood of the optimal value. Moreover, it is proved that weight estimation errors of neural networks are uniformly ultimately bounded. Finally, the effectiveness of the algorithm is verified by the simulation example.
Keyword :
Optimal tracking control Optimal tracking control Policy iteration Policy iteration Neural networks Neural networks Approximation errors Approximation errors Model-free control Model-free control Adaptive dynamic programming Adaptive dynamic programming
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Gao, Ning , Wang, Ding , Zhao, Mingming et al. Model-free intelligent critic design with error analysis for neural tracking control [J]. | NEUROCOMPUTING , 2024 , 572 . |
MLA | Gao, Ning et al. "Model-free intelligent critic design with error analysis for neural tracking control" . | NEUROCOMPUTING 572 (2024) . |
APA | Gao, Ning , Wang, Ding , Zhao, Mingming , Hu, Lingzhi . Model-free intelligent critic design with error analysis for neural tracking control . | NEUROCOMPUTING , 2024 , 572 . |
Export to | NoteExpress RIS BibTex |
Abstract :
This paper focuses on the prescribed performance adaptive containment control problem for a class of nonlinear nonstrict-feedback multiagent systems (MASs) with unknown disturbances and full-state constraints. First, the radial basis function neural networks (RBF NNs) technology is employed to approximate the unknown nonlinear functions in the system, and the problem of "explosion of complexity" caused by repeated derivation of virtual controls is solved by using the dynamic surface control (DSC) technology. Then, the nonlinear disturbance observers are designed to estimate the external disturbance, and the barrier Lyapunov functions (BLFs) and the prescribed performance function (PPF) are combined to achieve the control objective of prescribed performance without violating the full-state constraints. The theoretical result shows that all signals in the closed-loop system are semiglobally uniformly ultimately bounded (SGUUB), and the local neighborhood containment errors can converge to the specified boundary. Finally, two simulation examples show the effectiveness of the proposed method. Note to Practitioners-The containment control problem is a hot topic in the field of control, which plays an important role in practical engineering. Especially for this problem of nonlinear MASs, the mathematical models are difficult to be obtained accurately. This paper investigates the prescribed performance adaptive containment control problem for the nonlinear nonstrict-feedback MASs, whose model can be extended to more complex engineering applications, such as unmanned aerial vehicle formations and intelligent traffic management. It is worth noting that external disturbances and state constraint problems often exist in practical applications. Therefore, the disturance observers are designed to compensate for the system disturbances, which can eliminate the impacts of disturbances on the systems. By introducing BLFs, it is ensured that all states of the system are constrained within the specified regions. To sum up, the paper proposes a prescribed performance adaptive containment control strategy, which contributes to the development of containment control for MASs in practical applications.
Keyword :
Nonlinear systems Nonlinear systems Complexity theory Complexity theory adaptive containment control adaptive containment control Consensus control Consensus control prescribed performance prescribed performance disturbance observer disturbance observer Nonlinear nonstrict-feedback MASs Nonlinear nonstrict-feedback MASs Multi-agent systems Multi-agent systems Explosions Explosions Backstepping Backstepping full-state constraints full-state constraints Disturbance observers Disturbance observers
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Sui, Jihang , Liu, Chao , Niu, Ben et al. Prescribed Performance Adaptive Containment Control for Full-State Constrained Nonlinear Multiagent Systems: A Disturbance Observer-Based Design Strategy [J]. | IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING , 2024 . |
MLA | Sui, Jihang et al. "Prescribed Performance Adaptive Containment Control for Full-State Constrained Nonlinear Multiagent Systems: A Disturbance Observer-Based Design Strategy" . | IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING (2024) . |
APA | Sui, Jihang , Liu, Chao , Niu, Ben , Zhao, Xudong , Wang, Ding , Yan, Bocheng . Prescribed Performance Adaptive Containment Control for Full-State Constrained Nonlinear Multiagent Systems: A Disturbance Observer-Based Design Strategy . | IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING , 2024 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Reinforcement learning (RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming (ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively. Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks, showing how they promote ADP formulation significantly. Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has demonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
Keyword :
complex environment complex environment optimal control optimal control data-driven control data-driven control Adaptive dynamic programming (ADP) Adaptive dynamic programming (ADP) nonlinear systems nonlinear systems intelligent control intelligent control advanced control advanced control event-triggered design event-triggered design reinforcement learning (RL) reinforcement learning (RL) neural networks neural networks
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Ding , Gao, Ning , Liu, Derong et al. Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications [J]. | IEEE-CAA JOURNAL OF AUTOMATICA SINICA , 2024 , 11 (1) : 18-36 . |
MLA | Wang, Ding et al. "Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications" . | IEEE-CAA JOURNAL OF AUTOMATICA SINICA 11 . 1 (2024) : 18-36 . |
APA | Wang, Ding , Gao, Ning , Liu, Derong , Li, Jinna , Lewis, Frank L. . Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications . | IEEE-CAA JOURNAL OF AUTOMATICA SINICA , 2024 , 11 (1) , 18-36 . |
Export to | NoteExpress RIS BibTex |
Abstract :
In this article, an adaptive critic scheme with a novel performance index function is developed to solve the tracking control problem, which eliminates the tracking error and possesses the adjustable convergence rate in the offline learning process. Under some conditions, the convergence and monotonicity of the accelerated value function sequence can be guaranteed. Combining the advantages of the adjustable and general value iteration schemes, an integrated algorithm is proposed with a fast guaranteed convergence, which involves two stages, namely the acceleration stage and the convergence stage. Moreover, an effective approach is given to adaptively determine the acceleration interval. With this operation, the fast convergence of the new value iteration scheme can be fully utilized. Finally, compared with the general value iteration, the numerical results are presented to verify the fast convergence and the tracking performance of the developed adaptive critic design.
Keyword :
fast convergence fast convergence value iteration value iteration adaptive critic designs adaptive critic designs adaptive dynamic programming adaptive dynamic programming nonlinear tracking control nonlinear tracking control
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Ding , Wang, Yuan , Ha, Mingming et al. Improved value iteration for nonlinear tracking control with accelerated learning [J]. | INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL , 2024 , 34 (6) : 4112-4131 . |
MLA | Wang, Ding et al. "Improved value iteration for nonlinear tracking control with accelerated learning" . | INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL 34 . 6 (2024) : 4112-4131 . |
APA | Wang, Ding , Wang, Yuan , Ha, Mingming , Ren, Jin , Qiao, Junfei . Improved value iteration for nonlinear tracking control with accelerated learning . | INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL , 2024 , 34 (6) , 4112-4131 . |
Export to | NoteExpress RIS BibTex |
Abstract :
With the deepening of modernization and industrialization, the issues of water pollution and scarcity have become more pressing. To address these issues, many wastewater treatment factories have been built to improve the reuse of water resources. However, the control of the wastewater treatment process (WWTP) is a complex task due to the highly nonlinear and strongly coupled nature. It is challenging to develop the accurate mechanism models of the wastewater treatment system. The improvement of the efficiency for the WWTP is crucial to safeguard the urban ecological environment. In this paper, adaptive critic with weight allocation (ACWA) is developed to address the optimal control problem in the WWTP. Different from the previous methods of the WWTP, system modeling is not adopted in this paper, which meets the actual physical background of the wastewater treatment system to a great extent. In addition, the actor -critic algorithm in reinforcement learning is used as the basic structure in the ACWA. It is worth noting that a novel weighted action -value function and the advantage function are introduced in the weight updating process of the action network and the critic network. The experimental results show that the control accuracy of the ACWA is greatly improved compared with the previous control methods.
Keyword :
Adaptive critic design Adaptive critic design Actor-critic Actor-critic Reinforcement learning Reinforcement learning Neural networks Neural networks Wastewater treatment processes Wastewater treatment processes
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Ding , Ma, Hongyu , Ren, Jin et al. Adaptive critic design with weight allocation for intelligent learning control of wastewater treatment plants [J]. | ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE , 2024 , 133 . |
MLA | Wang, Ding et al. "Adaptive critic design with weight allocation for intelligent learning control of wastewater treatment plants" . | ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 133 (2024) . |
APA | Wang, Ding , Ma, Hongyu , Ren, Jin , Gao, Ning , Qiao, Junfei . Adaptive critic design with weight allocation for intelligent learning control of wastewater treatment plants . | ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE , 2024 , 133 . |
Export to | NoteExpress RIS BibTex |
Abstract :
In this paper, an optimal trajectory tracking control problem for general nonlinear systems is investigated. An adaptive critic control method with the digital twin (DT) theory is developed. Divergent from the existing tracking control methods, the advantages of adaptive dynamic programming (ADP) and the theory of DT are combined in this paper, and the novel multilayer artificial system structure is constructed. The actioncritic structure is employed by each artificial system to obtain an approximate optimal control policy. The model network (MN) is built by using the actual input and output data sets of the controlled system, which means the dependence on the dynamics of the system is overcame. Then, the weights of the trained action network (AN) and MN are passed to the real system to realize the optimal tracking control. The feasibility of the algorithm is proved by theoretical analysis. Finally, the algorithm is applied to a simple nonlinear torsional pendulum system and an industrial wastewater treatment system (WWTS), and the effectiveness of the algorithm is verified. The algorithm effectively realizes the tracking control of nonlinear systems.
Keyword :
Neural networks Neural networks Data-driven control Data-driven control Digital twin theory Digital twin theory Tracking control Tracking control Wastewater treatment Wastewater treatment Adaptive dynamic programming Adaptive dynamic programming
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Ding , Ma, Hongyu , Qiao, Junfei . Multilayer adaptive critic design with digital twin for data-driven optimal tracking control and industrial applications [J]. | ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE , 2024 , 133 . |
MLA | Wang, Ding et al. "Multilayer adaptive critic design with digital twin for data-driven optimal tracking control and industrial applications" . | ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 133 (2024) . |
APA | Wang, Ding , Ma, Hongyu , Qiao, Junfei . Multilayer adaptive critic design with digital twin for data-driven optimal tracking control and industrial applications . | ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE , 2024 , 133 . |
Export to | NoteExpress RIS BibTex |
Export
Results: |
Selected to |
Format: |