Query:
学者姓名:王鼎
Refining:
Year
Type
Indexed by
Source
Complex
Former Name
Co-Author
Language
Clean All
Abstract :
This article develops a novel data-driven safe Q-learning method to design the safe optimal controller which can guarantee constrained states of nonlinear systems always stay in the safe region while providing an optimal performance. First, we design an augmented utility function consisting of an adjustable positive definite control obstacle function and a quadratic form of the next state to ensure the safety and optimality. Second, by exploiting a pre-designed admissible policy for initialization, an off-policy stabilizing value iteration Q-learning (SVIQL) algorithm is presented to seek the safe optimal policy by using offline data within the safe region rather than the mathematical model. Third, the monotonicity, safety, and optimality of the SVIQL algorithm are theoretically proven. To obtain the initial admissible policy for SVIQL, an offline VIQL algorithm with zero initialization is constructed and a new admissibility criterion is established for immature iterative policies. Moreover, the critic and action networks with precise approximation ability are established to promote the operation of VIQL and SVIQL algorithms. Finally, three simulation experiments are conducted to demonstrate the virtue and superiority of the developed safe Q-learning method.
Keyword :
Adaptive critic control Adaptive critic control Optimal control Optimal control Safety Safety Mathematical models Mathematical models stabilizing value iteration Q-learning (SVIQL) stabilizing value iteration Q-learning (SVIQL) Heuristic algorithms Heuristic algorithms Learning systems Learning systems adaptive dynamic programming (ADP) adaptive dynamic programming (ADP) control barrier functions (CBF) control barrier functions (CBF) state constraints state constraints Q-learning Q-learning Iterative methods Iterative methods
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Zhao, Mingming , Wang, Ding , Song, Shijie et al. Safe Q-Learning for Data-Driven Nonlinear Optimal Control with Asymmetric State Constraints [J]. | IEEE-CAA JOURNAL OF AUTOMATICA SINICA , 2024 , 11 (12) : 2408-2422 . |
MLA | Zhao, Mingming et al. "Safe Q-Learning for Data-Driven Nonlinear Optimal Control with Asymmetric State Constraints" . | IEEE-CAA JOURNAL OF AUTOMATICA SINICA 11 . 12 (2024) : 2408-2422 . |
APA | Zhao, Mingming , Wang, Ding , Song, Shijie , Qiao, Junfei . Safe Q-Learning for Data-Driven Nonlinear Optimal Control with Asymmetric State Constraints . | IEEE-CAA JOURNAL OF AUTOMATICA SINICA , 2024 , 11 (12) , 2408-2422 . |
Export to | NoteExpress RIS BibTex |
Abstract :
In this paper, an adjustable Q -learning scheme is developed to solve the discrete -time nonlinear zero -sum game problem, which can accelerate the convergence rate of the iterative Q -function sequence. First, the monotonicity and convergence of the iterative Q -function sequence are analyzed under some conditions. Moreover, by employing neural networks, the model -free tracking control problem can be overcome for zerosum games. Second, two practical algorithms are designed to guarantee the convergence with accelerated learning. In one algorithm, an adjustable acceleration phase is added to the iteration process of Q -learning, which can be adaptively terminated with convergence guarantee. In another algorithm, a novel acceleration function is developed, which can adjust the relaxation factor to ensure the convergence. Finally, through a simulation example with the practical physical background, the fantastic performance of the developed algorithm is demonstrated with neural networks.
Keyword :
Adaptive dynamic programming Adaptive dynamic programming Optimal tracking control Optimal tracking control Neural networks Neural networks Q-learning Q-learning Zero-sum games Zero-sum games Convergence rate Convergence rate
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Yuan , Wang, Ding , Zhao, Mingming et al. Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate [J]. | NEURAL NETWORKS , 2024 , 175 . |
MLA | Wang, Yuan et al. "Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate" . | NEURAL NETWORKS 175 (2024) . |
APA | Wang, Yuan , Wang, Ding , Zhao, Mingming , Liu, Nan , Qiao, Junfei . Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate . | NEURAL NETWORKS , 2024 , 175 . |
Export to | NoteExpress RIS BibTex |
Abstract :
In this article, an evolution-guided value iteration (EGVI) algorithm is established to address optimal tracking problems for nonlinear nonaffine systems. Conventional adaptive dynamic programming algorithms rely on gradient information to improve the policy, which adheres to the first order necessity condition. Nonetheless, these methods encounter limitations when gradient information is intricate or system dynamics lack differentiability. In response to this challenge, evolutionary computation is leveraged by EGVI to search for the optimal policy without requiring an action network. The competition within the policy population serves as the driving force for policy improvement. Therefore, EGVI can effectively handle complex and non-differentiable systems. Additionally, this innovative method has the potential to enhance exploration efficiency and bolster the robustness of algorithms due to its population-based characteristics. Furthermore, the convergence of the algorithm and the stability of the policy are investigated based on the EGVI framework. Finally, the effectiveness of the established method is comprehensively demonstrated through two simulation experiments.
Keyword :
Adaptive dynamic programming Adaptive dynamic programming Intelligent control Intelligent control Optimal tracking Optimal tracking Reinforcement learning Reinforcement learning Adaptive critic designs Adaptive critic designs Evolutionary computation Evolutionary computation
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Huang, Haiming , Wang, Ding , Zhao, Mingming et al. Evolution-guided value iteration for optimal tracking control [J]. | NEUROCOMPUTING , 2024 , 593 . |
MLA | Huang, Haiming et al. "Evolution-guided value iteration for optimal tracking control" . | NEUROCOMPUTING 593 (2024) . |
APA | Huang, Haiming , Wang, Ding , Zhao, Mingming , Hu, Qinna . Evolution-guided value iteration for optimal tracking control . | NEUROCOMPUTING , 2024 , 593 . |
Export to | NoteExpress RIS BibTex |
Abstract :
The wastewater treatment process (WWTP) is beneficial for maintaining sufficient water resources and recycling wastewater. A crucial link of WWTP is to ensure that the dissolved oxygen (DO) concentration is continuously maintained at the predetermined value, which can actually be considered as a tracking problem. In this article, an experience replay-based action-dependent heuristic dynamic programming (ER-ADHDP) method is developed to design the model-free tracking controller to accomplish the tracking goal of the DO concentration. First, the online ER-ADHDP controller is regarded as a supplementary controller to conduct the model-free tracking control alongside a stabilizing controller with a priori knowledge. The online ER-ADHDP method can adaptively adjust weight parameters of critic and action networks, thereby continuously ameliorating the tracking result over time. Second, the ER technique is integrated into the critic and action networks to promote the data utilization efficiency and accelerate the learning process. Third, a rational stability result is provided to theoretically ensure the usefulness of the ER-ADHDP tracking design. Finally, simulation experiments including different reference trajectories are conducted to show the superb tracking performance and excellent adaptability of the proposed ER-ADHDP method.
Keyword :
wastewater treatment applications wastewater treatment applications tracking control tracking control Action-dependent heuristic dynamic programming (ADHDP) Action-dependent heuristic dynamic programming (ADHDP) adaptive dynamic programming (ADP) adaptive dynamic programming (ADP) adaptive critic control adaptive critic control
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Qiao, Junfei , Zhao, Mingming , Wang, Ding et al. Action-Dependent Heuristic Dynamic Programming With Experience Replay for Wastewater Treatment Processes [J]. | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS , 2024 , 20 (4) : 6257-6265 . |
MLA | Qiao, Junfei et al. "Action-Dependent Heuristic Dynamic Programming With Experience Replay for Wastewater Treatment Processes" . | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 20 . 4 (2024) : 6257-6265 . |
APA | Qiao, Junfei , Zhao, Mingming , Wang, Ding , Li, Menghua . Action-Dependent Heuristic Dynamic Programming With Experience Replay for Wastewater Treatment Processes . | IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS , 2024 , 20 (4) , 6257-6265 . |
Export to | NoteExpress RIS BibTex |
Abstract :
The core of the optimal tracking control problem for nonlinear systems is how to ensure that the controlled system tracks the desired trajectory. The utility functions in previous studies have different properties which affect the final tracking effect of the intelligent critic algorithm. In this paper, we introduce a novel utility function and propose a Q -function based policy iteration algorithm to eliminate the final tracking error. In addition, neural networks are used as function approximator to approximate the performance index and control policy. Considering the impact of the approximation error on the tracking performance, an approximation error bound for each iteration of the novel Q -function is established. Under the given conditions, the approximation Q -function converges to the finite neighborhood of the optimal value. Moreover, it is proved that weight estimation errors of neural networks are uniformly ultimately bounded. Finally, the effectiveness of the algorithm is verified by the simulation example.
Keyword :
Optimal tracking control Optimal tracking control Policy iteration Policy iteration Neural networks Neural networks Approximation errors Approximation errors Model-free control Model-free control Adaptive dynamic programming Adaptive dynamic programming
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Gao, Ning , Wang, Ding , Zhao, Mingming et al. Model-free intelligent critic design with error analysis for neural tracking control [J]. | NEUROCOMPUTING , 2024 , 572 . |
MLA | Gao, Ning et al. "Model-free intelligent critic design with error analysis for neural tracking control" . | NEUROCOMPUTING 572 (2024) . |
APA | Gao, Ning , Wang, Ding , Zhao, Mingming , Hu, Lingzhi . Model-free intelligent critic design with error analysis for neural tracking control . | NEUROCOMPUTING , 2024 , 572 . |
Export to | NoteExpress RIS BibTex |
Abstract :
This paper focuses on the prescribed performance adaptive containment control problem for a class of nonlinear nonstrict-feedback multiagent systems (MASs) with unknown disturbances and full-state constraints. First, the radial basis function neural networks (RBF NNs) technology is employed to approximate the unknown nonlinear functions in the system, and the problem of "explosion of complexity" caused by repeated derivation of virtual controls is solved by using the dynamic surface control (DSC) technology. Then, the nonlinear disturbance observers are designed to estimate the external disturbance, and the barrier Lyapunov functions (BLFs) and the prescribed performance function (PPF) are combined to achieve the control objective of prescribed performance without violating the full-state constraints. The theoretical result shows that all signals in the closed-loop system are semiglobally uniformly ultimately bounded (SGUUB), and the local neighborhood containment errors can converge to the specified boundary. Finally, two simulation examples show the effectiveness of the proposed method. Note to Practitioners-The containment control problem is a hot topic in the field of control, which plays an important role in practical engineering. Especially for this problem of nonlinear MASs, the mathematical models are difficult to be obtained accurately. This paper investigates the prescribed performance adaptive containment control problem for the nonlinear nonstrict-feedback MASs, whose model can be extended to more complex engineering applications, such as unmanned aerial vehicle formations and intelligent traffic management. It is worth noting that external disturbances and state constraint problems often exist in practical applications. Therefore, the disturance observers are designed to compensate for the system disturbances, which can eliminate the impacts of disturbances on the systems. By introducing BLFs, it is ensured that all states of the system are constrained within the specified regions. To sum up, the paper proposes a prescribed performance adaptive containment control strategy, which contributes to the development of containment control for MASs in practical applications.
Keyword :
Nonlinear systems Nonlinear systems Complexity theory Complexity theory adaptive containment control adaptive containment control Consensus control Consensus control prescribed performance prescribed performance disturbance observer disturbance observer Nonlinear nonstrict-feedback MASs Nonlinear nonstrict-feedback MASs Multi-agent systems Multi-agent systems Explosions Explosions Backstepping Backstepping full-state constraints full-state constraints Disturbance observers Disturbance observers
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Sui, Jihang , Liu, Chao , Niu, Ben et al. Prescribed Performance Adaptive Containment Control for Full-State Constrained Nonlinear Multiagent Systems: A Disturbance Observer-Based Design Strategy [J]. | IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING , 2024 . |
MLA | Sui, Jihang et al. "Prescribed Performance Adaptive Containment Control for Full-State Constrained Nonlinear Multiagent Systems: A Disturbance Observer-Based Design Strategy" . | IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING (2024) . |
APA | Sui, Jihang , Liu, Chao , Niu, Ben , Zhao, Xudong , Wang, Ding , Yan, Bocheng . Prescribed Performance Adaptive Containment Control for Full-State Constrained Nonlinear Multiagent Systems: A Disturbance Observer-Based Design Strategy . | IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING , 2024 . |
Export to | NoteExpress RIS BibTex |
Abstract :
This article focuses on the adaptive fuzzy practical predefined-time bipartite consensus tracking control (BCTC) problem for heterogeneous nonlinear multiagent systems (HNMASs) with actuator faults. First, the fuzzy logic systems are used to approximate the unknown nonlinear functions. Then, the partial loss of effectiveness and bias fault of actuator are considered simultaneously in the HNMASs, which is effectively handled by using adaptive compensation technology. In addition, the developed bipartite consensus tracking control protocol based on a practical predefined-time strategy not only ensures the fast convergence of the studied systems, but also predetermines the convergence time not relied on the initial conditions. The theoretical result shows that all signals of the closed-loop system are semiglobally uniformly predefined-time bounded, and the BCTC performance is guaranteed within the predefined time. Finally, the simulation example based on the agents of different orders shows the validity of the obtained results.
Keyword :
Actuators Actuators bipartite consensus tracking bipartite consensus tracking predefined-time (PT) control predefined-time (PT) control Symmetric matrices Symmetric matrices Nonlinear systems Nonlinear systems Fuzzy systems Fuzzy systems Fuzzy logic Fuzzy logic Convergence Convergence heterogeneous nonlinear multiagent systems (HNMASs) heterogeneous nonlinear multiagent systems (HNMASs) Actuator faults Actuator faults Multi-agent systems Multi-agent systems
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Niu, Ben , Sui, Jihang , Zhao, Xudong et al. Adaptive Fuzzy Practical Predefined-Time Bipartite Consensus Tracking Control for Heterogeneous Nonlinear MASs With Actuator Faults [J]. | IEEE TRANSACTIONS ON FUZZY SYSTEMS , 2024 , 32 (5) : 3071-3083 . |
MLA | Niu, Ben et al. "Adaptive Fuzzy Practical Predefined-Time Bipartite Consensus Tracking Control for Heterogeneous Nonlinear MASs With Actuator Faults" . | IEEE TRANSACTIONS ON FUZZY SYSTEMS 32 . 5 (2024) : 3071-3083 . |
APA | Niu, Ben , Sui, Jihang , Zhao, Xudong , Wang, Ding , Zhao, Xinliang , Niu, Yi . Adaptive Fuzzy Practical Predefined-Time Bipartite Consensus Tracking Control for Heterogeneous Nonlinear MASs With Actuator Faults . | IEEE TRANSACTIONS ON FUZZY SYSTEMS , 2024 , 32 (5) , 3071-3083 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Wastewater treatment is important for maintaining a balanced urban ecosystem. To ensure the success of wastewater treatment, the tracking error between the crucial variable concentrations and the set point needs to be minimized as much as possible. Since the multiple biochemical reactions are involved, the wastewater treatment system is a nonlinear system with unknown dynamics. For this class of systems, this paper develops an online action dependent heuristic dynamic programming (ADHDP) algorithm combining the temporal difference with lambda [TD(lambda)], which is called ADHDP(lambda). By introducing the TD(lambda), the future n-step information is considered and the learning efficiency of the ADHDP algorithm is improved. We not only give the implementation process of the ADHDP(lambda) algorithm based on neural networks, but also prove the stability of the algorithm under certain conditions. Finally, the effectiveness of the ADHDP(lambda) algorithm is verified through two nonlinear systems, including a wastewater treatment system and a torsional pendulum system. Simulation results show that the ADHDP(lambda) algorithm has higher learning efficiency compared to the general ADHDP algorithm.
Keyword :
Reinforcement learning Reinforcement learning Wastewater treatment processes Wastewater treatment processes Temporal difference with lambda Temporal difference with lambda Action dependent heuristic dynamic programming Action dependent heuristic dynamic programming Online control Online control
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Li, Xin , Wang, Ding , Zhao, Mingming et al. Reinforcement learning control with n-step information for wastewater treatment systems [J]. | ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE , 2024 , 133 . |
MLA | Li, Xin et al. "Reinforcement learning control with n-step information for wastewater treatment systems" . | ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 133 (2024) . |
APA | Li, Xin , Wang, Ding , Zhao, Mingming , Qiao, Junfei . Reinforcement learning control with n-step information for wastewater treatment systems . | ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE , 2024 , 133 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Reinforcement learning (RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming (ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively. Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks, showing how they promote ADP formulation significantly. Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has demonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
Keyword :
complex environment complex environment optimal control optimal control data-driven control data-driven control Adaptive dynamic programming (ADP) Adaptive dynamic programming (ADP) nonlinear systems nonlinear systems intelligent control intelligent control advanced control advanced control event-triggered design event-triggered design reinforcement learning (RL) reinforcement learning (RL) neural networks neural networks
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Ding , Gao, Ning , Liu, Derong et al. Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications [J]. | IEEE-CAA JOURNAL OF AUTOMATICA SINICA , 2024 , 11 (1) : 18-36 . |
MLA | Wang, Ding et al. "Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications" . | IEEE-CAA JOURNAL OF AUTOMATICA SINICA 11 . 1 (2024) : 18-36 . |
APA | Wang, Ding , Gao, Ning , Liu, Derong , Li, Jinna , Lewis, Frank L. . Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications . | IEEE-CAA JOURNAL OF AUTOMATICA SINICA , 2024 , 11 (1) , 18-36 . |
Export to | NoteExpress RIS BibTex |
Abstract :
In this article, an adaptive critic scheme with a novel performance index function is developed to solve the tracking control problem, which eliminates the tracking error and possesses the adjustable convergence rate in the offline learning process. Under some conditions, the convergence and monotonicity of the accelerated value function sequence can be guaranteed. Combining the advantages of the adjustable and general value iteration schemes, an integrated algorithm is proposed with a fast guaranteed convergence, which involves two stages, namely the acceleration stage and the convergence stage. Moreover, an effective approach is given to adaptively determine the acceleration interval. With this operation, the fast convergence of the new value iteration scheme can be fully utilized. Finally, compared with the general value iteration, the numerical results are presented to verify the fast convergence and the tracking performance of the developed adaptive critic design.
Keyword :
fast convergence fast convergence value iteration value iteration adaptive critic designs adaptive critic designs adaptive dynamic programming adaptive dynamic programming nonlinear tracking control nonlinear tracking control
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Ding , Wang, Yuan , Ha, Mingming et al. Improved value iteration for nonlinear tracking control with accelerated learning [J]. | INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL , 2024 , 34 (6) : 4112-4131 . |
MLA | Wang, Ding et al. "Improved value iteration for nonlinear tracking control with accelerated learning" . | INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL 34 . 6 (2024) : 4112-4131 . |
APA | Wang, Ding , Wang, Yuan , Ha, Mingming , Ren, Jin , Qiao, Junfei . Improved value iteration for nonlinear tracking control with accelerated learning . | INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL , 2024 , 34 (6) , 4112-4131 . |
Export to | NoteExpress RIS BibTex |
Export
Results: |
Selected to |
Format: |