IEEE/CAA Journal of Automatica Sinica  2014, Vol.1 Issue (3): 282-293   PDF    
Event-Triggered Optimal Adaptive Control Algorithm for Continuous-Time Nonlinear Systems
Kyriakos G. Vamvoudakis     
Center for Control, Dynamicalsystems and Computation (CCDC), University of California, Santa Barbara, CA 93106-9560, USA
Abstract: This paper proposes a novel optimal adaptive eventtriggered control algorithm for nonlinear continuous-time systems. The goal is to reduce the controller updates, by sampling the state only when an event is triggered to maintain stability and optimality. The online algorithm is implemented based on an actor/critic neural network structure. A critic neural network is used to approximate the cost and an actor neural network is used to approximate the optimal event-triggered controller. Since in the algorithm proposed there are dynamics that exhibit continuous evolutions described by ordinary differential equations and instantaneous jumps or impulses, we will use an impulsive system approach. A Lyapunov stability proof ensures that the closed-loop system is asymptotically stable. Finally, we illustrate the effectiveness of the proposed solution compared to a timetriggered controller.
Key words: Event-triggered     optimal control     adaptive control     reinforcement learning    
Ⅰ. INTRODUCTION

OPTIMAL feedback control design has been responsible for much of the successful performance of engineered systems in aerospace,industrial processes,vehicles,ships,and robotics. But the vast majority of the feedback controllers use digital computers,and rely on periodic sampling,computation,and actuation. Shared congestion and energy saving objectives demand that every information through a network should be rigorously decided when to transmit. For that reason one needs to design ``bandwidth'' effective controllers that can function in event-driven environments and update their values only when needed. Event-triggered control design is a newly developed framework that can potentially have lots of applications that have limited resources and controller bandwidth and offers a new point of view,with respect to conventional time-driven strategies,on how information could be sampled for control purposes.

The event-triggered control algorithms (see [1, 2, 3]) are composed of a feedback controller updated based on sampled state,and the event-triggering mechanism that determines when the control signal has to be transmitted from a stability and performance point of view. This can reduce the computation and communication resources significantly. For linear systems,sampled-data control theory provides powerful tools for direct digital design,while implementations of nonlinear control designs tend to rely on discretization combined with fast periodic sampling. The basic idea is to communicate,compute,or control only when something significant has occurred in the system. The motivation for abandoning the time-triggered paradigm is to better cope with various constraints or bottlenecks in the system,such as sensors with limited resolution,limited communication or computation bandwidth,energy constraints,or constraints on the number of actuations. In order for the controller to keep the previous used state sample,a sampled-data and a zero-order-hold (ZOH) actuator component are commonly used.

All the event-triggered control algorithms available in the literature rely on a combination of offline computations,in the sense of computing the Riccati or Hamilton-Jacobi-Bellman equations,and online computations in the sense of updating the controller. Computing and updating controller parameters using online solutions may allow for changing dynamics,e.g.,to handle the reduced weight of an aircraft as the fuel burns. All this work has been mostly done for linear systems. For nonlinear systems things are more complicated because of the infeasibility of the Hamilton-Jacobi-Bellman equation. For that reason,one needs to combine event-triggered controllers with computational intelligent ideas to solve the complicated Hamilton-Jacobi-Bellman equation online by updating the controller only when it is needed but still guaranteeing optimal performance of the original system and not a linearized version of it. To overcome all those limitations we will use a reinforcement learning technique and specifically an actor/critic neural network framework[4]. The actor neural network will eventually approximate the event-triggered optimal controller and the critic neural network will approximate the optimal cost. But since the dynamics are evolving in both discrete time and continuous time we will model it as an impulsive model[5, 6, 7]. The discrete time dynamics will take care of the ``jumps'' of the controller when an event is triggered and the continuous time dynamics will take care of the ``inter-event'' instants of the state,and the actor/critic neural networks.

There are several triggering conditions proposed in the literature mostly static state-feedback controllers (e.g.,[1, 2, 3],and the references therein) and output based controllers as in [8]. In most of them,the event is triggered when the error between the last event occurrence and the current observation of the plant exceeds a bound. The authors in [9] proposed a networked event-based control scheme for linear and nonlinear systems under non-ideal communication conditions,namely with packet drop outs and transmission delays. A framework for event-triggered controller communications in a network of multi-agents is proposed by [10] to solve a synchronization problem of linear agents. A problem with network delays and packet losses is being considered in [11],where the authors extented the regular event-triggered controller to cope with delays and losses in the feedback signal. In [12],the authors propose two types of optimal event-based control design under lossy communication. Despite of their computational benefits compared to the optimal solution,it turns out that both algorithms approach the optimal solution very closely but their algorithm relies heavily on offline optimization schemes. By following the work of [2, 3],[13] combined model-based network control and event-triggered control to design a framework for stabilization of uncertain dynamical systems subject to quantization and time-varying network delays. In [1], three different approaches are considered for periodic event-triggered control for linear systems,an impulsive approach,a discrete-time piecewise linear system approach and a perturbed linear system approach. But all this is done for linear systems by solving differential Riccati equations offline.

The contributions of the present paper are fourfold. First,it is the first time that an event-triggered controller for a nonlinear system is solved online with guaranteed performance and without any linearizing process. Second,an actor/critic algorithm is used to approximate the cost and the event-triggered controller by using neural networks with continuous and jump dynamics. Those dynamics are modeled as impulsive systems. Third,we avoid using a zero-order hold component explicitly in the implementation of our algorithm, but rather implement it inside the actor neural network. Finally, the paper provides stability and optimal performance guarantees.

The remainder of the paper is structured as follows. Section II formulates the problem. In Section III,we propose the online event-triggered optimal adaptive control algorithm by using an actor/critic reinforcement learning framework. Simulation results and comparisons to a time-triggered controller for a linear and a nonlinear system are presented in Section IV. Finally,Section V concludes and talks about future work.

Notation. $\bf R$ denotes the real numbers set,$R^+$ denotes the set $\{x\in R:x>0\}$,$R^{n_1 \times n_2}$ is the set of $n_1 \times n_2$ real matrices,$N>0$ is the set of natural numbers excluding zero,$\left\|\cdot\right\|$ denotes the Euclidean norm,$\left\|\cdot\right\|_\mathcal{P}$ denotes the weighted Euclidean norm with $\mathcal{P}$ a matrix of appropriate dimensions,and $(\cdot)^{\rm T}$ denotes the transpose. Moreover, we write $\underline{\lambda}\big(M\big)$ for the minimum eigenvalue of matrix $M$ and $\bar{\lambda}\big(M\big)$ for the maximum. Finally a continuous function $\alpha:[0,\alpha)\rightarrow[0,\infty)$ is said to belong to class $\mathcal{K}$ functions if it is strictly increasing and $\alpha(0)=0$.

Ⅱ.PROBLEM FORMULATION

Consider the following nonlinear continuous-time system:

$ \frac{{\rm d}}{{\rm d}t}{x}=f(x(t))+g(x(t))u(t),\ x(0)=x_0,\ t\geq 0, $ (1)

where $x\in R^n$ is a measurable state vector,$f(x)\in{\bf R}^n$,is the drift dynamics,$g(x)\in R^{n\times m}$ is the input dynamics and $u\in R^m$ is the control input. It is assumed that $f(0)=0$ and $f(x)+g(x)u$ is locally Lipschitz and that the system is stabilizable.

In order to save resources the controller will work with a sampled version of the state. For that reason one needs to introduce a sampled-data component that is characterized by a monotone increasing sequence of sampling instants (broadcast release times) $\{r_j\}_{j=1}^\infty$,where $r_j$ is the $j$-th consecutive sampling instant. The output of the sampled-data component is a sequence of sampled states $\hat{x}_j$,where $\hat{x}_j=x(r_j)$ for all $t\in[r_j,r_{j+1})$ and $j\in N$. The controller maps the sampled state onto a control vector $\hat{u}_j$,which after using a ZOH becomes a continuous-time input signal. For simplicity,we will assume that the sampled-data systems have zero task delays. An event triggered control system architecture is shown in Fig. 1.

Download:
Fig. 1 Event trigger control schematic.

In order to decide when an event is triggered we will define the gap or difference between the current state $x(t)$ and the sampled state $\hat{x}_j(t)$ as

$ e_j(t):=\hat{x}_j(t)-x(t),\ \forall t\in(r_{j-1},r_{j}], $ (2)

and the dynamics of the gap are evolving according to

$ \dot{e}_j(t)=-\dot{x}(t),\ \forall t\in(r_{j-1},r_{j}],\ e_j(0)=0. $ (3)

Remark 1. Note that when an event is triggered at $t=r_j$,a new state measurement is rendered that resets the gap $e_j$ to zero.

We want to find a controller $u$ of the form $u=k(\hat{x}_j(t))\equiv k(x(t)+e_j(t))$ that minimizes a cost functional similar to the one with the time-triggered controller

$ J(x(0);u)=\int_0^\infty \big (u^{\rm T} u+ Q(x) \big){\rm d}\tau, $

with monotonically increasing function $Q(x)\geq 0$ on $R^n$ and with limited updates.

By using the error defined in (2),the closed loop dynamics with $u(\hat{x}_j)=k(\hat{x}_j(t))$ during the interval $(r_{j-1},r_{j}]$ can be written as

$ \dot{x}=f(x)+g(x)k(x+e_j),\ t\geq0, $ (4)

to achieve input-to-state stability (ISS) with respect to the measurement errors $e_j$ (see (2)). The following definition adopted from [3] is needed.

Definition 1. A smooth positive definite function $V:{\bf R}^n\rightarrow R^+$ is said to be ISS Lyapunov function for the closed-loop dynamics (4) if there exist class $\mathcal{K}_\infty$ functions $\underline{\alpha},\ \bar{\alpha}, \ \alpha$ and $\gamma$ satisfying

$ \begin{align*} &\underline{\alpha}\big(\left|x\right|\big)\leq V(x)\leq \bar{\alpha}\big(\left|x\right|\big),\\ &\frac{\partial V}{\partial x}^{\rm T}\big(f(x)+g(x)k(x+e_j)\big)\leq-\alpha \big(\left|x\right|\big)+\gamma \big(\left|e_j\right|\big). \end{align*} $

From this point of view,it is clear that in order for the closed-loop dynamics (4) to be ISS with respect to the measurement errors $e_j$ then there exists an ISS Lyapunov function for (4).

We shall see that the restrictive assumption on ISS is not needed in our proposed algorithm.

The ultimate goal is to find the optimal cost function $V^*$ defined by

$ \begin{align}\label{eq:optimalcost} V^*(x(t)):=\min_{u} \int_t^\infty \big (u^{\rm T} u+Q(x) \big){\rm d}\tau,\forall t\geq0, \end{align} $ (5)

subject to the constraint (1) given an aperiodic event-triggered controller as will be defined in the subsequent analysis.

Ⅲ.EVENT-TRIGGERED REGULATOR AND EXISTENCE OF SOLUTION

One can define the Hamiltonian associated with (1) and (5) for the time-triggered case as

$ \begin{align}\label{eq:hamiltonian} &H(x,u(x),\frac{\partial V^*(x)}{\partial x})=\frac{\partial V^*(x)}{\partial x}^{\rm T} \big(f(x)+g(x)u(x)\big)+\\& \qquad u(x)^{\rm T} u(x)+Q(x),\forall x,u. \end{align} $ (6)

Now assume that the controller has unlimited bandwidth. Then one needs to find the control input $u(t)$ such that the cost function (3) is minimized. Hence,we will employ the stationarity condition[14] into the Hamiltonian equation (6) and we will have

$ \begin{align*} u^*(x)=\arg\min_{u} H\left(x,u,\frac{\partial V^*(x)}{\partial x}\right), \end{align*} $

or

$ \begin{align}\label{eq:optimalcontrol} \frac{\partial H(x,u,\frac{\partial V^*(x)}{\partial x})}{\partial u}=0\Rightarrow u^*(x)=-\frac{1}{2} g(x)^{\rm T} \frac{\partial V^*(x)}{\partial x}, \end{align} $ (7)

for the time-triggered case.

The optimal cost and the optimal control satisfy the following Hamilton-Jacobi-Bellman (HJB) equation,

$ \begin{align}\label{eq:hjb} &H(x,u^*(x),\frac{\partial V^*(x)}{\partial x})\equiv \\ &\qquad\frac{\partial V^*(x)}{\partial x}^{\rm T}\Big(f(x)-\frac{1}{2}g(x) g(x)^{\rm T}\frac{\partial V^*(x)}{\partial x}\Big)+\\& \qquad\frac{1}{4}\frac{\partial V^*(x)}{\partial x}^{\rm T} g (x) g(x)^{\rm T}\frac{\partial V^*(x)}{\partial x}+Q(x)=0,\forall x. \end{align} $ (8)

From now on,we will call the HJB (8) as time-triggered HJB equation.

In order to reduce the communication between the controller and the plant,one needs to use an event-triggered version of the above HJB equation (8) by introducing a sampled-data component with aperiodic controller updates that ensure a certain condition on the state of the plant to guarantee stability and performance as we will see in the subsequent analysis. For that reason the control input uses the sampled-state information instead of the true one and hence (7) becomes

$ \begin{align}\label{eq:troptimalcontrol} &u^*(\hat{x}_j)=-\frac{1}{2} g(\hat{x}_j)^{\rm T}\frac{\partial V^*(\hat{x}_j)}{\partial x},\notag\\ &\qquad \text{ for } t\in(r_{j-1},r_{j}] \text{ and } j\in N \end{align} $ (9)

By using the event-triggered controller given by (9),the HJB equation (8) becomes $\forall x,\hat{x}_j\in R^n$,

$ \begin{align}\label{eq:eventhjb} &H(x,u^*(\hat{x}_j),\frac{\partial V^*(x)}{\partial x})=\\ &\quad \frac{\partial V^*(x)}{\partial x}^{\rm T}\Big(f(x)-\frac{1}{2}g(x) g(\hat{x}_j)^{\rm T}\frac{\partial V^*(\hat{x}_j)}{\partial x}\Big)+\\& \quad\frac{1}{4}\frac{\partial V^*(\hat{x}_j)}{\partial x}^{\rm T} g (\hat{x}_j) g(\hat{x}_j)^{\rm T}\frac{\partial V^*(\hat{x}_j)}{\partial x}+Q(x), \end{align} $ (10)

which is eventually the equation we would like to quantify and compare it to (8).

The following assumptions[2] are needed before we state the existence of solution and stability theorem for the event-triggered control system.

Assumption 1. Lipschitz continuity of the controller with respect to the gap $e_j$,

$ \begin{align*} \left\|u(x)-u(\hat{x}_j)\right\|\leq L \left\|e_j\right\|. \end{align*} $

Assumption 2. Lipschitz continuity of the closed-loop system with respect to the state and to the gap $e_j$,

$ \begin{align*} \left\|f(x)+g(x) k(x+e_j)\right\|\leq L_{fg} \left\|e_j\right\|+L_{fg} \left\|x\right\|. \end{align*} $

Remark 2. These assumptions are satisfied in many applications where the controller is affine with respect to $e_j$.

Note that the control signal $u\in R\rightarrow R^m$ is a piecewise constant function (produced by a ZOH without any delay). The sequence of control can be written as

$ \begin{align*} u(t)=\sum_{j\in N} \bar{u}(t), \end{align*} $

with,

$ \begin{align*} \bar{u}(t)=\begin{cases}u(\hat{x}_j),\ t\in(r_{j-1},r_{j}]\\ 0,\text{ otherwise.} \end{cases} \end{align*} $

Lemma 1. Suppose that Assumption 1 holds. Then the event-triggered HJB given by (10) can be related to time-triggered HJB given by (8) as

$ \begin{align}\label{eq:newdif} &H(x,u^*(\hat{x}_j),\frac{\partial V^*(x)}{\partial x})=\\ &\quad \big(u^*(x)-u^*(\hat{x}_j)\big)^{\rm T}\big(u^*(x)-u^*(\hat{x}_j)\big). \end{align} $ (11)

Proof. In order to prove the equivalence,we will take the difference between (8) and (10) as follows:

$ \begin{array}{l} H(x,{u^*}(x),\frac{{\partial {V^*}(x)}}{{\partial x}}) - H(x,{u^*}({{\hat x}_j}),\frac{{\partial {V^*}(x)}}{{\partial x}}) \equiv \\ {\frac{{\partial {V^*}(x)}}{{\partial x}}^{\rm{T}}}(f(x) - \frac{1}{2}g(x)g{(x)^{\rm{T}}}\frac{{\partial {V^*}(x)}}{{\partial x}}) + \\ \frac{1}{4}{\frac{{\partial {V^*}(x)}}{{\partial x}}^{\rm{T}}}g(x)g{(x)^{\rm{T}}}\frac{{\partial {V^*}(x)}}{{\partial x}} + Q(x) - \\ {\frac{{\partial {V^*}(x)}}{{\partial x}}^{\rm{T}}}(f(x) - \frac{1}{2}g(x)g{({{\hat x}_j})^{\rm{T}}}\frac{{\partial {V^*}({{\hat x}_j})}}{{\partial x}}) - \\ \frac{1}{4}{\frac{{\partial {V^*}({{\hat x}_j})}}{{\partial x}}^{\rm{T}}}g({{\hat x}_j})g{({{\hat x}_j})^{\rm{T}}}\frac{{\partial {V^*}({{\hat x}_j})}}{{\partial x}} - Q(x) = \\ - \frac{1}{4}{\frac{{\partial {V^*}(x)}}{{\partial x}}^{\rm{T}}}g(x)g{(x)^{\rm{T}}}\frac{{\partial {V^*}(x)}}{{\partial x}} + \\ \frac{1}{2}{\frac{{\partial {V^*}(x)}}{{\partial x}}^{\rm{T}}}g(x)g{({{\hat x}_j})^{\rm{T}}}\frac{{\partial {V^*}({{\hat x}_j})}}{{\partial x}} - \\ \frac{1}{4}{\frac{{\partial {V^*}({{\hat x}_j})}}{{\partial x}}^{\rm{T}}}g({{\hat x}_j})g{({{\hat x}_j})^{\rm{T}}}\frac{{\partial {V^*}({{\hat x}_j})}}{{\partial x}}, \end{array} $ (12)

where from (8), $H(x,u^*(\hat{x}_j),\frac{\partial V^*(x)}{\partial x})=0$. Now from using (7) and (9) we can write (12) as

$ \begin{align*} &H\left(x,u^*(x),\frac{\partial V^*(x)}{\partial x}\right)-H\left(x,u^*(\hat{x}_j),\frac{\partial V^*(x)}{\partial x}\right)=\\ &\quad-\Big(u^*(x)-u^*(\hat{x}_j)\Big)^{\rm T}\big(u^*(x)-u^*(\hat{x}_j)\big), \end{align*} $

from which the result follows.

Remark 3. After taking norms and using Assumption 1 yields,

$ \begin{align*} &\left\|H(x,u^*(x),\frac{\partial V^*(x)}{\partial x})\!-\!H(x,u^*(\hat{x}_j),\frac{\partial V^*(x)}{\partial x})\right\|\!\leq L^2\left\|e_j\right\|^2. \end{align*} $

Now since the controller is ISS then the state dependent threshold $\alpha \big(\left|x\right|\big)\geq\gamma \big(\left|e_j\right|\big)$ means that at the broadcast time $e_j$ is forced to be zero. As the state asymptotically approaches the origin the threshold gets smaller. Furthermore,$V^*$ from Definition 1 is an ISS-Lyapunov function for the continuously sampled (time-triggered) system. It is straightforward that $u^*(x)$ is a discretized version of $u^*(\hat{x}_j)$.

Remark 4. We shall see in Theorem 1 that we will pick $e_j$ to guarantee optimality and stability.

The following lemma adopted from [3] proves that the inter-execution times are bounded.

Lemma 2. For any compact set $\mathcal{S}\subseteq {\bf R}^n$ containing the origin,there exists a time $\tau\in{\bf R}^+$ such that for any initial condition originating from $\mathcal{S}$ the inter execution release times $\{r_{j}-r_{j-1}\}_{j\in N}$ defined implicitly by (19) are lower bounded as $\tau\leq r_{j}-r_{j-1}$ for all $j\in N$. See [3] for the proof.

Theorem 1. Suppose there exists a positive definite function $V\in C^1$ that satisfies the time-triggered HJB equation (8) with $V(0)=0$. The closed-loop system for $t\in(r_{j-1},r_{j}]$,and all $j\in N$ with control policy given by

$ \begin{align}\label{eq:ncontrol} u(\hat{x}_j)=-\frac{1}{2} g(\hat{x}_j)^{\rm T}\frac{\partial V(\hat{x}_j)}{\partial x}, \end{align} $ (13)

and triggering condition

$ \begin{align}\label{eq:bound2} \left\|e_j\right\|^2\leq\frac{(1-\beta^2)}{L^2} \underline{\lambda}\big(Q\big)\left\|x\right\|^2+\frac{1}{L^2}\left\|u(\hat{x}_j)\right\|^2, \end{align} $ (14)

for some user defined parameter $\beta\in(0,1)$ is asymptotically stable. Moreover the control policy (13) is optimal and the optimal value is given by

$ \begin{align}\label{eq:opval} J^*(\cdot;u^*)&=\notag\\ &\int_{0}^{\infty} \big ((u^*({x})-u^*(\hat{x}_j))^{\rm T}(u^*({x})-u^*(\hat{x}_j))\big){\rm d}t+\\& V^*(x (0)). \end{align} $ (15)

Proof. The orbital derivative along the solution of (1) with the event-triggered controller $\forall t\in (r_{j-1},r_{j}]$ is

$ \begin{align}\label{eq:lyap11} \dot{V}&=\frac{\partial V}{\partial x}\dot{x}=\frac{\partial V(x)}{\partial x}^{\rm T} \big(f(x)+g(x)u(\hat{x}_j)\big). \end{align} $

Now,by writing the time-triggered HJB equation (8) as

$ \begin{align*} \frac{\partial V(x)}{\partial x}^{\rm T} f(x)=\frac{1}{4}\frac{\partial V(x)}{\partial x}^{\rm T} g (x) g(x)^{\rm T}\frac{\partial V(x)}{\partial x}-Q(x). \end{align*} $

We can rewrite the orbital derivative (16) with the event-triggered controller as

$ \begin{align} \dot{V}&=\frac{1}{4}\frac{\partial V(x)}{\partial x}^{\rm T} g (x) g({x})^{\rm T}\frac{\partial V(x)}{\partial x}-Q(x)+\\ & \frac{\partial V(x)}{\partial x}^{\rm T}g(x)u(\hat{x}_j)\equiv\\& u(x)^{\rm T} u(x)-Q(x)-2u(x)^{\rm T} u(\hat{x}_j), \end{align} $ (16)

since $\frac{\partial V}{\partial x}^{\rm T}g(x)=-2u^{\rm T}$.

By using the Lipschitz condition from Assumption 1,we can write

$ \begin{align}\label{eq:lipin} &-2 u(x)^{\rm T}u(\hat{x}_j)+u(x)^{\rm T} u(x)=\left\|u(x)-u(\hat{x}_j)\right\|^2-\\&\quad u(\hat{x}_j)^{\rm T} u(\hat{x}_j) \leq L^2 \left\|e_j\right\|^2-\left\|u(\hat{x}_j)\right\|^2. \end{align} $ (7)

Finally by substituting (18) into (17),and assuming that $Q(x):=x^{\rm T} Q x$ with $Q\in R^{n\times n}\geq 0$,one has the following bound:

$ \begin{align*} \dot{V}&\leq-Q(x)+L^2 \left\|e_j\right\|^2-\left\|u(\hat{x}_j)\right\|^2\equiv\\ & -\beta^2 \underline{\lambda}\big(Q\big)\left\|x\right\|^2-(1-\beta^2) \underline{\lambda}\big(Q\big)\left\|x\right\|^2+\\ &L^2 \left\|e_j\right\|^2-\left\|u(\hat{x}_j)\right\|^2. \end{align*} $

Finally the closed-loop system is asymptotically stable given that the following inequality is satisfied for all $t\in(r_{j-1},r_{j}]$,

$ \begin{align}\label{eq:bound} \left\|e_j\right\|^2\leq\frac{(1-\beta^2)}{L^2} \underline{\lambda}\big(Q\big) \left\|x\right\|^2+\frac{1}{L^2}\left\|u(\hat{x}_j)\right\|^2. \end{align} $ (18)

Now we need to show that the inter-transmission time is nontrivial. For $t\in(r_{j-1},r_{j}]$ and by using the Lipschitz continuity of the closed-loop dynamics according to Assumption 1,it has been shown in [2] that

$ \begin{align*} T_j\equiv r_{j}-r_{j-1}\geq \frac{1}{L_{fg}+L_{fg} W},\ j\in{\bf N}, \end{align*} $

where we used the fact that the ratio of the gap and the system state must be greater than a positive constant $\frac{1}{W}$.

Since the function $V$ is smooth,zero at zero and converge to zero as $t\rightarrow \infty$ and by denoting as $V^*$ the optimal cost we can write (3) as

$ \begin{align*} &J(\cdot;u)=\int_{0}^{\infty} \big (u^{\rm T} u+Q(x) \big){\rm d}t +V^*({x}(0))+\\&\quad \int_{0}^{\infty}\frac{\partial V^*(x)}{\partial x}^{\rm T}\bigg({f}(x)+{g}(x)u(\hat{x}_j)\bigg){\rm d}t. \end{align*} $

Now after subtracting zero from (11) $H(x,u^*(\hat{x}_j),\frac{\partial V^*(x)}{\partial x})-\big(u^*(x)-u^*(\hat{x}_j)\big)^{\rm T}\big(u^*(x)-u^*(\hat{x}_j)\big)=0$ and completing the squares yields

$ \begin{align*} &J(\cdot;u)=\int_{0}^{\infty} \big ((u(\hat{x}_j)-u^*(\hat{x}_j))^{\rm T}(u(\hat{x}_j)-\notag\\ &\quad u^*(\hat{x}_j))\big){\rm d}t+V^*({x} (0))+\int_{0}^{\infty} ((u^*({x})-\\ &\quad u^*(\hat{x}_j))^{\rm T}(u^*({x})-u^*(\hat{x}_j))){\rm d}t. \end{align*} $

Now by setting $u(\hat{x}_j)=u^*(\hat{x}_j)$,it is easy to prove that

$ \begin{align}\label{eq:newper} J^*(\cdot;{u}^*)&\equiv \notag\\ &\int_{0}^{\infty} \big ((u^*({x})-u^*(\hat{x}_j))^{\rm T}(u^*({x})-u^*(\hat{x}_j))\big){\rm d}t+\\&V^*({x} (0))\leq J(\cdot;u), \end{align} $ (19)

where for brevity we have omitted the dependence on the initial conditions.

Remark 5. It is worth noting that if one needs to approach the performance of the time-triggered controller we need to make the quantization error term $\int_{0}^{\infty} \big ((u^*({x})-u^*(\hat{x}_j))^{\rm T}(u^*({x})-u^*(\hat{x}_j))\big){\rm d}t$ as close to zero as possible by adjusting the parameter $\beta$ of the triggering condition given in (14). This means that when $\beta$ is close to $1$ one samples more frequently whereas when $\beta$ is close to zero,the intersampling periods become longer and the performance will be far from the time-triggered optimal controller.

Remark 6. The control law (13) is well-defined and satisfies the usual optimality condition $ J^*(x(0);u^*)\leq J(x(0);u)$ as discussed in [15].

Remark 7. The state sampler will continuously monitor the condition (14) and when a violation is about to occur the sampler will trigger the sampling of the system state.

In linear systems of the form $\dot{x}=Ax+Bu$ with $A,\ B$ matrices of appropriate dimensions the previous theorem takes the following form.

Corollary 1. Suppose there exists a positive definite function of the form $V=\frac{1}{2}x^{\rm T} P x$ where $P\in{\bf R}^{n\times n}$ is a symmetric positive definite matrix that satisfies the following Riccati equation:

$ \begin{align}\label{eq:riccati} \frac{1}{2} \big( A^{\rm T} P+P A\big)-\frac{1}{4} P B B^{\rm T} P+Q=0, \end{align} $ (20)

where $Q\in R^{n\times n}\geq 0$. The closed-loop system for $t\in(r_{j-1},r_{j}]$,and all $j\in N$ with control policy given by

$ \begin{align}\label{eq:lcontrol} u(\hat{x}_j)=-\frac{1}{2} B^{\rm T} P \hat{x}_j \end{align} $ (21)

and triggering condition given by (14) is asymptotically stable. Moreover the control policy (22) is optimal,and the value is

$ \begin{align*} &J^*(\cdot;u^*)=x(0)^{\rm T} P x(0)+\\ &\qquad\int_{0}^{\infty} \big(u^*(x)-u^*(\hat{x}_j)\big)^{\rm T}\big(u^*(x)-u^*(\hat{x}_j)\big){\rm d}t. \end{align*} $

Proof. The proof follows from Theorem 1 by direct substitutions.

Remark 8. Note that according to [2] one can use a simpler condition than (14),namely $e_j^{\rm T}\bigg((1-\beta^2)Q+\frac{1}{4}PB B^{\rm T} P\bigg) e_j\leq \delta x_j(r_j)^{\rm T}\bigg(\frac{1}{2} (1-\beta^2)Q+\frac{1}{4} P B B^{\rm T} P\bigg) x_j(r_j)$ for some user defined parameters $\beta,\delta\in(0,1)$.

A visualization of how the value (15) changes with the size of bandwidth is presented in Fig. 2,where one can see that as the bandwidth increases by tweaking $\beta$ to be close to $1$,one approaches asymptotically the performance of the infinite bandwidth or time-triggered controller.

Download:
Fig. 2 Visualization of the performance of the event-triggered control with respect to the time-triggered control and increased bandwidth.

Solving the event-triggered HJB equation (10) for the optimal cost for nonlinear systems and the Riccati equation (21) for the matrix $P$ for linear systems is in most of the cases infeasible and has to be done in an offline manner that does not allow the system to change its objective while operating. For that reason the following section will provide an actor/critic neural network framework to approximate the solution of the discretely sampled state controller HJB and Riccati equations.

Ⅳ.ACTOR/CRITIC ALGORITHM

The first step to solve the event-triggered HJB equation (10) is to approximate the value function $V^*(x)$ from (5). The value function can be represented in a compact set $\Omega\subseteq{\bf R}^n$ by a critic neural network of the form

$ \begin{align}\label{eq:critic} V^*(x)={W^*}^{\rm T}\phi(x)+\epsilon_{c}(x),\quad\forall x\in R^{n}, \end{align} $ (22)

where $W^*\in R^h$ denotes the ideal weights bounded as $\left\|W^*\right\|\leq W_{\rm max}$,and $\phi : = [{\phi _1}{\phi _2} \cdots {\phi _h}]:{R^n} \to {R^h}$,is a bounded continuously differentiable basis function ($\left\|\phi\right\|\leq \phi_{\rm max}$ and $\left\|\frac{\partial \phi}{\partial x}\right\|\leq\phi_{d\rm max}$) the activation functions with $h$ neurons,and $\epsilon_{c}(x)$ is the corresponding residual error such that $\sup_{x\in\Omega} \left\|\epsilon_c\right\|\leq \epsilon_{c\rm max}$ and $\sup_{x\in\Omega} \left\|\frac{\partial \epsilon_c}{\partial x}\right\|\leq \epsilon_{dc\rm max}$. The activation functions $\phi$ are selected such that $h\rightarrow \infty$ one has a complete independent basis for $V^*$.

For causality issues and due to the online nature of our algorithm,we will define the triggering inter execution release time to be in $t\in(r_{j-1},r_j]$ with $j\in N$ (in the subsequent analysis,$j$ will be in this set).

Based on this,the optimal event-triggered controller in (13) can be re-written as

$ \begin{align}\label{eq:starvalue1} u^*(\hat{x}_j)&=-\frac{1}{2} g(\hat{x}_j)^{\rm T}(\frac{\partial \phi(\hat{x}_j)}{\partial x}^{\rm T} W^* +\frac{\partial \epsilon_c(\hat{x}_j)}{\partial x}),\ t\in (r_{j-1},r_j]. \end{align} $ (23)

Remark 9. The control input jumps at the triggering instants and remains constant $\forall t\in (r_{j-1},r_j]$. This is in generally achieved with zero-order hold but we shall see that in our algorithm it is not necessary.

A. Learning Algorithm

The optimal event-triggered controller (24) can be approximated by another neural network which we call an actor. This has the following form for all $t\in(r_{j-1},r_j]$:

$ \begin{align}\label{eq:actor} u^*(\hat{x}_j)={W_u^*}^{\rm T} \phi_u(\hat{x}_j)+\epsilon_u(\hat{x}_j),\ \forall \hat{x}_j,\ j\in N, \end{align} $ (24)

where $W_u^*\in R^{h_2\times m}$ are the optimal weights and $\phi_u(\hat{x}_j)$ are the NN activation functions defined similarly to the critic NN and $h_2$ is the number of neurons in the hidden layer and $\epsilon_u$ is the actor approximation error. Note that in order for $u^*$ to be uniformly approximated the activation functions must define a complete independent basis set. The residual error $\epsilon_u$ and the activation functions are assumed to be upper bounded by positive constants as $\sup_{x_j\in\Omega}\left\|\epsilon_u\right\|\leq \epsilon_{u\rm max}$ and $\left\|\phi_u\right\|\leq \phi_{u\rm max}$,respectively.

The value function (23) and the optimal policy (25) using current estimates $\hat{W}_c$ and $\hat{W}_u$ respectively of the ideal weights $W^*$ and $W_u^*$ are given by the following critic and actor neural networks,

$ \begin{align}\label{eq:apcritic} \hat{V}(x(t))=\hat{W}_c^{\rm T}\phi(x(t)),\forall x, \end{align} $ (25)
$ \begin{align}\label{eq:apactor} \hat{u}(\hat{x}_j)=\hat{W}_u^{\rm T}\phi_u(\hat{x}_j),\forall \hat{x}_j. \end{align} $ (26)

Our goal now should be to find the tuning laws for the weights $\hat{W}$ and $\hat{W}_u$. In order to do that we will use adaptive control techniques [16]. For that reason we will define the error $e_c\in R$ as

$ \begin{align*} e_c&:=H\left(x,\hat{u}(\hat{x}_j),\frac{\partial \hat{V}(x)}{\partial x}\right)-H\left(x,u^*(\hat{x}_j),\frac{\partial V^*(x)}{\partial x}\right)=\\& \hat{W}_c^{\rm T} \frac{\partial \phi}{\partial x}\big(f(x)+g(x)\hat{u}(\hat{x}_j)\big)+\hat{r} =\hat{W}_c^{\rm T} \omega+\hat{r}, \end{align*} $

with $\omega:=\frac{\partial \phi}{\partial x}\big(f(x)+g(x)\hat{u}(\hat{x}_j)\big)$, $\hat{r}:=\hat{u}(\hat{x}_j)^{\rm T}\hat{u}(\hat{x}_j)+Q(x)$ and $H(x,u^*(\hat{x}_j),\frac{\partial V^*(x)}{\partial x})=0$ from (10). In order to drive the error $e_c$ to zero,one has to pick appropriately the critic neural network weights.

Hence one needs to pick the weights $\hat{W}_c$ to minimize an integral squared error of the form $K=\frac{1}{2} \left\|e_c\right\|^2$ as

$ \begin{align}\label{eq:rls} \dot{\hat{W}}_c=-\alpha \frac{\partial K}{\partial \hat{W}_c}= -\alpha\frac{\ \omega}{(\omega^{\rm T}\omega +1)^2}\big(\omega^{\rm T} \hat{W}_c+\hat{r}\big), \end{align} $ (27)

where $\alpha$ determines the speed of convergence.

By defining the critic error dynamics as $\tilde{W}_c:=W^*-\hat{W}_c$ and taking the time derivative one has

$ \begin{align}\label{eq:criticerror} \dot{\tilde{W}}_c=-\alpha \frac{\omega\ \omega^{\rm T}}{(\omega^{\rm T}\omega +1)^2}\tilde{W}_c+\alpha\frac{\omega}{(\omega^{\rm T}\omega +1)^2}\epsilon_{\rm Hc}, \end{align} $ (28)

where $\epsilon_{\rm Hc}:=-\frac{\partial \epsilon_c}{\partial x}(f+g \hat{u}),\forall x,\hat{u}$ upper bounded by $\epsilon_{\rm Hcmax}\in R^+$ as $\left\|\epsilon_{{\rm H}c}\right\|\leq\epsilon_{\rm Hcmax}$.

It is convenient to write the critic error dynamics as a sum of a nominal and a perturbed system $\dot{\tilde{W}}_c=S_{\rm nom}+S_{\rm pert}$ where $S_{\rm nom}:=-\alpha \frac{\omega\ \omega^{\rm T}}{(\omega^{\rm T}\omega +1)^2}\tilde{W}_c$ and $S_{\rm pert}:=\alpha\frac{\omega}{(\omega^{\rm T}\omega +1)^2}\epsilon_{\rm Hc}$.

Theorem 2. The nominal system $S_{\rm nom}$ is exponentially stable and its trajectories are satisfying $\|\tilde{W}_c(t)\|\leq \|\tilde{W}_c(t_0)\|\kappa_1 {\rm e}^{-\kappa_2(t-t_0)}$,for some $\kappa_1,\ \kappa_2\in R^+$ for $t>t_0\ge0$ provided that the signal $M:=\frac{\omega}{(\omega^{\rm T}\omega +1)}$ is persistently exciting (PE) over the interval $[t,t+T]$ with $\int_t^{t+T}M M^{\rm T} {\rm d}\tau\geq \gamma I$,with $\gamma\in R^+$ and $I$ is an identity matrix of appropriate dimensions and also that there exists $M_B\in R^+$ such that for all $t\geq t_0$, $\max\{\left|M\right|,\left|\dot{M}\right|\}\leq M_B$.

Proof. Consider the Lyapunov function $\mathcal{L}:{\bf R}^h\rightarrow R$,for all $t\geq0$,

$ \begin{align}\label{eq:lyap1} \mathcal{L}=\frac{1}{2\alpha}\tilde{W}_c^{\rm T} \tilde{W}_c. \end{align} $ (29)

By differentiating (30) along the $S_{\rm nom}$ one has

$ \begin{align*} \dot{\mathcal{L}}&=-\tilde{W}_c^{\rm T}\frac{\omega\ \omega^{\rm T}}{(\omega^{\rm T}\omega +1)^2} \tilde{W}_c\leq0. \end{align*} $

Viewing the nominal system $S_{\rm nom}$ as a linear time-varying system,the solution $\tilde{W}_c$ is given as (the reader is directed to [8] for the details)

$ \begin{align}\label{eq:ww} \tilde{W}_c(t)=\Phi(t,t_0)\tilde{W}_c(t_0), \end{align} $ (30)

where the state transition matrix is defined as $\frac{\partial \Phi(t,t_0)}{\partial t}:=-\alpha M M^{\rm T}\Phi(t,t_0)$. Therefore we can prove that for the nominal system,the equilibrium point is exponentially stable provided that $M$ is PE and therefore for some $\kappa_{1},\ \kappa_{2}\in R^+$ we can write $\forall t\geq t_0$,

$ \begin{align}\label{eq:ww2} \left\|\Phi(t,t_0)\right\|\leq \kappa_{1} {\rm e}^{-\kappa_{2}(t-t_0)}. \end{align} $ (31)

Finally by combining (31) and (32),we have

$ \begin{align*} \left\|\tilde{W}_c(t)\right\|\leq\left\|\tilde{W}_c(t_0)\right\|\kappa_{1}{\rm e}^{-\kappa_{2}(t-t_0)}, \end{align*} $

from which the result follows.

Remark 10. The above theorem designs an observer for the critic weights. It is shown in [17, 18, 19] that $\kappa_{1}, \kappa_{2}\in R^+$ can be expressed as functions of $T,\ \gamma,\ B_M$. Based on the aforementioned papers we can prove the relaxed persistence of excitation condition (e.g., $u\bar{\delta}$-PE). The interested reader is directed there for more details.

Now in order to find the tuning for the actor neural network we need to define the error $e_u\in R^m$ in the following form,

$ \begin{align*} e_u:=\hat{W}_u \phi_u(\hat{x}_j)+\frac{1}{2} g(\hat{x}_j)^{\rm T}\frac{\partial \phi_u(\hat{x}_j)}{\partial x}^{\rm T} \hat{W}_c,\ \forall \hat{x}_j. \end{align*} $

The objective is to select $\hat{W}_u$ such that the error $e_u$ goes to zero. For that reason we will select to minimize the following squared error performance:

$ \begin{align*} E_u=\frac{1}{2}{\rm tr}\big\{e_u^{\rm T} e_u\big\}. \end{align*} $

The nature of the update law for the actors will have an aperiodic nature and hence,it has to be updated only at the trigger instants and held constant otherwise. This has a form of an impulsive system as described in [5, 7].

We can then define the following laws:

$ \begin{align}\label{eq:cactor} \dot{\hat{W}}_u(t)=0,\ \text{for } r_{j-1}< t\leq r_j, \end{align} $ (32)

and the jump equation to compute $\hat{W}_u(r_j^+)$ given by

$ \begin{align}\label{eq:dactor} &{\hat{W}}_u^+=\hat{W}_u(t)-\alpha_u \phi_u({x}(t))\bigg(\hat{W}_u^{\rm T} \phi_u(x(t))+\\ &\quad\frac{1}{2}g(x(t))^{\rm T}\frac{\partial \phi(x(t))}{\partial x}^{\rm T}\hat{W}_c\bigg)^{\rm T},\ \text{for } t=r_j. \end{align} $ (33)

By defining the actor error dynamics as $\tilde{W}_u:=W_u^*-\hat{W}_u$ and taking the time derivative using the continuous update (33) and by using the jump system (34) updated at the trigger instants one has

$ \begin{align}\label{eq:ecactor} \dot{\tilde{W}}_u(t)=0,\ \text{for } r_{j-1}< t\leq r_j, \end{align} $ (34)

and

$ \begin{align}\label{eq:edactor} &{\tilde{W}}_u^+=\tilde{W}_u(t)-\alpha_u \phi_u(x(t)) \phi_u(x(t))^{\rm T} \tilde{W}_u(t)-\\ & \quad\alpha_u \phi_u(x(t)) \phi_u(x(t))^{\rm T} \epsilon_u-\\ &\quad\frac{1}{2}\alpha_u \phi_u(x(t))\tilde{W}_c^{\rm T} \frac{\partial \phi(x(t))}{\partial x}g(x(t))-\\ &\quad\frac{1}{2}\alpha_u\phi_u(x(t)) \frac{\partial \epsilon_c}{\partial x}g(x(t)),\ \text{for } t=r_j, \end{align} $ (35)

respectively.

B.Impulsive System Approach

Before we proceed to the design of the impulsive system the following assumption is needed,

Assumption 3. The function $g(\cdot)$ is uniformly bounded on $\Omega$,i.e.,$\sup_{x\in\Omega}\left\|g(x)\right\|\leq g_{\rm max}$.

Since the dynamics are continuous but the controller jumps to a new value when an event is triggered we need to formulate the closed-loop system as an impulsive system[5]. But first in order to deal with the presence of the neural network approximation errors and known bounds[20],and obtain an asymptotically stable closed-loop system one needs to add a robustifying term to the closed loop system of the form,

$ \begin{align*} \eta(t)=-B\frac{\hat{x}_j^{\rm T} \hat{x}_j \mathbf{1}_m}{A+\hat{x}_j^{\rm T} \hat{x}_j},\ j\in N, \end{align*} $

where $A,B\in R^+$ satisfy

$ \begin{align}\label{eq:Bbound} &B>\frac{A+\left\|\hat{x}_j\right\|^2}{\big(W_{\rm max}\phi_{d\rm max}+\epsilon_{dc\rm max}\big)g_{\rm max}\left\|\hat{x}_j\right\|^2}\times\notag\\ &\quad \bigg(\frac{1}{4\alpha} \omega_{\rm max} \epsilon_{\rm Hcmax}+\mu\bigg), \end{align} $ (36)

where

$ \begin{align}\label{eq:mu} &\mu:=\frac{1}{2}\big(\phi_{u\rm max}^2 \epsilon_{u\rm max}\big)^2 +\frac{1}{2}\big(\left\|\tilde{W}_c\right\|\phi_{u\rm max} \phi_{d\rm max}g_{\rm max}\big)^2+\\& \quad \frac{1}{2}\big(\phi_{u\rm max}\epsilon_{dc\rm max}g_{\rm max}\big)^2 +\frac{\alpha_u }{2}\phi_{u\rm max}^4\epsilon_{u\rm max}^2+\\& \quad\frac{\alpha_u }{8}\phi_{u\rm max}^2\left\|\tilde{W}_c\right\|^2\phi_{d\rm max}^2 g_{\rm max}^2 +\frac{\alpha_u }{8}\phi_{u\rm max}^2\epsilon_{dc\rm max}^2 g_{\rm max}^2+\\& \quad\frac{1}{2}\big(\alpha_u\phi_{u\rm max}^3\epsilon_{u\rm max}^2\big)^2 +\frac{1}{2}\alpha_u\phi_{u\rm max}\phi_{d\rm max}g_{\rm max}\left\|\tilde{W}_c\right\|+\\& \quad\frac{1}{2}\alpha_u\phi_{u\rm max}\epsilon_{dc\rm max}g_{\rm max}+\\& \quad\frac{1}{2}\alpha_u\phi_{u\rm max}\epsilon_{u\rm max}\phi_{d\rm max}g_{\rm max}\left\|\tilde{W}_c\right\|+\\& \quad\frac{1}{4}\alpha_u\epsilon_{dc\rm max}\phi_{d\rm max}g_{\rm max}\left\|\tilde{W}_c\right\|. \end{align} $ (37)

The closed-loop system dynamics (4) can now be written as

$ \begin{align}\label{eq:newclosed} \dot{x}&=f(x)+g(x)\times\\&\qquad\big((W_u^*-\tilde{W}_u)^{\rm T} \phi_u(\hat{x}_j)+\eta\big),\ t\geq0,\ j\in N. \end{align} $ (38)

Finally we can combine the following continuous and discrete time dynamics (29),(35),(36),(39) by defining the augmented state $\psi:=[x^{\rm T} ~~ \hat{x}_j^{\rm T} ~~ \tilde{W}_c^{\rm T} ~~ \tilde{W}_u^{\rm T}]^{\rm T}$ and after taking the time derivative for $t\in(r_{j-1},r_j],\ j\in N$,

$ \begin{align}\label{eq:cimpulsive} \dot{\psi}=\left[\begin{array}{*{20}c}f(x)+g(x)\big((W_u^*-\tilde{W}_u)^{\rm T} \phi_u(\hat{x}_j)+\eta\big)\\0\\-\alpha \frac{\omega\ \omega^{\rm T}}{(\omega^{\rm T}\omega +1)^2}\tilde{W}_c+\alpha\frac{\omega}{(\omega^{\rm T}\omega +1)^2}\epsilon_{\rm Hc}\\0\end{array}\right], \end{align} $ (39)

which are the dynamics while the controller is kept constant and the jump dynamics for $t=r_j$ is given by

$ \begin{align}\label{eq:dimpulsive} \psi^+&=\psi(t)+\left[\begin{array}{*{20}c}0\\e_j(t)\\0\\\Lambda_{t}\end{array}\right], \end{align} $ (40)

where $\Lambda_{t}:=-\alpha_u \phi_u(x(t))\bigg( \phi_u(x(t))^{\rm T} \tilde{W}_u(t) + \phi_u(x(t))^{\rm T} \epsilon_u+\frac{1}{2}\tilde{W}_c^{\rm T} \frac{\partial \phi(x(t))}{\partial x}g(x(t)) +\frac{1}{2}\frac{\partial \epsilon_c}{\partial x}g(x(t))\bigg)$ whenever an event is triggered to update the controller.

C. Convergence and Stability Analysis

Fact 1. The following normalized signal satisfies:

$ \begin{align*} \left\|\frac{\omega}{(\omega^{\rm T}\omega +1)}\right\|\leq \omega_{\rm max}:=\frac{1}{2}. \end{align*} $

The following theorem proves asymptotic stability of the impulsive closed-loop system described by (40) and (41),and convergence to the optimal solution.

Theorem 3. Consider the nonlinear continuous-time system given by (4) with the event-triggered control input given by (27) and the critic neural network given by (26). The tuning laws for the continuous-time critic and impulsive actor neural networks are given by (28),(33) and (34) respectively. Then there exists a quadruple $(\Omega_x \times \Omega_{\hat{x}_j}\times \Omega_{\tilde{W}_c}\times\Omega_{\tilde{W}_u})\subset \Omega$ with $\Omega$ compact such that the equilibrium point of the impulsive system $\psi\in(\Omega_x \times \Omega_{\hat{x}_j}\times \Omega_{\tilde{W}_c}\times\Omega_{\tilde{W}_u})$ exists globally and converges asymptotically to zero for all $x(0)$ inside $\Omega_x$,$\hat{x}_j(0)$ inside $\Omega_{\hat{x}_j}$, $\tilde{W}_c(0)$ inside $\Omega_{\tilde{W}_c}$ and $\tilde{W}_u(0)$ inside $\Omega_{\tilde{W}_u}$ given the following triggering condition:

$ \begin{align}\label{eq:finaltrigger} \left\|e_j\right\|^2\leq\frac{(1-\beta^2)}{L^2} \underline{\lambda}\big(Q\big) \left\|x\right\|^2+\frac{1}{L^2}\left\|\hat{W}_u^{\rm T} \phi_u(\hat{x}_j)\right\|^2, \end{align} $ (41)

where $\beta\in(0,1)$ and the following inequalities are satisfied:

$ \begin{align}\label{eq:inequ1} \alpha> \sqrt{\frac{1}{8\underline{\lambda}\big(M\big)}}, \end{align} $ (42)

for the critic neural network and

$ \begin{align}\label{eq:inequ2} \bigg(\phi_{u\rm max}^2-\frac{3}{2}-\frac{\alpha_u }{2}\phi_{u\rm max}^4 \bigg)>0, \end{align} $ (43)

for the actor neural network.

Proof. In order to prove stability of the impulsive model, we have to consider the continuous and the jump dynamics separately. Initially we will consider the following Lyapunov function $\mathcal{V}:R^n\times R^n\times{\bf R}^{h}\times R^{h_2}\rightarrow R$ for the continuous part (40) of the impulsive model,

$ \begin{align}\label{eq:l1} \mathcal{V}(\psi)&=V^*(x)+V^*(\hat{x}_j)+V_c+\\&\qquad\frac{\alpha_u^{-1}}{2}{\rm tr}\{\tilde{W}_u^{\rm T}\tilde{W}_u\},\ t\geq0,\ j\in N, \end{align} $ (44)

where $V^*(x)$ and $V^*(\hat{x}_j)$ are the optimal value functions for the continuous sampled and event-triggered sampled system and $V_c:=\|\tilde{W}_c\|^2$ is a Lyapunov function for the critic error dynamics given by (29). Note that $V_c$ satisfies $\frac{{\rm d}}{{\rm d}t}\|\tilde{W}_c\|^2\leq -2\alpha\underline{\lambda}\big(M\big)\|\tilde{W}_c\|^2$ where the latter is a consequence of Theorem 2.

By taking the time derivative of (45) of the first term with respect to the closed-loop system trajectories given by (39),the second term has a zero derivative,the third term with respect to the perturbed critic error estimation dynamics (29),and substitute the actor dynamics given by (33) (which are zero) for the last term one has (note that one can easily see the augmented state continuous dynamics in (40)) $\forall j\in N$ and $t\geq 0$,

$ \begin{align*} \dot{\mathcal{V}}=&\frac{\partial V^*}{\partial x}^{\rm T} (f(x)+g(x)\big((W_u^*-\tilde{W}_u)^{\rm T} \phi_u(\hat{x}_j)+\eta\big))-\\&\alpha\frac{\partial V_c}{\partial \tilde{W}_c}^{\rm T}\frac{\omega\ \omega^{\rm T}}{(\omega^{\rm T}\omega +1)^2}\tilde{W}_c +\\&\alpha\frac{\partial V_c}{\partial \tilde{W}_c}^{\rm T}\frac{\omega}{(\omega^{\rm T}\omega +1)^2}\epsilon_{\rm Hc}:=T_1+T_2, \end{align*} $

where for simplicity in the subsequent analysis we will consider the following two terms and we will neglect the robustifying term for now:

$ \begin{align}\label{eq:t1} T_1&:=\frac{\partial V^*}{\partial x}^{\rm T} (f(x)+g(x)\big((W_u^*-\tilde{W}_u)^{\rm T} \phi_u(\hat{x}_j)), \end{align} $ (45)

and

$ \begin{align}\label{eq:t2} T_2&:=-\alpha\frac{\partial V_c}{\partial \tilde{W}_c}^{\rm T}\frac{\omega\ \omega^{\rm T}} {(\omega^{\rm T}\omega +1)^2}\tilde{W}_c+ \\&\alpha\frac{\partial V_c}{\partial \tilde{W}_c}^{\rm T}\frac{\omega}{(\omega^{\rm T}\omega +1)^2}\epsilon_{\rm Hc}. \end{align} $ (46)

First we will simplify and upper bound the term $T_1$ from (46). In order to do that we will rewrite the time-triggered HJB equation (8) as

$ \begin{align}\label{eq:hjb22} &\frac{\partial V^*}{\partial x}^{\rm T} f(x)=\notag\\ &\quad-\frac{\partial V^*}{\partial x}^{\rm T} g(x) u^*(x)-u^*(x)^{\rm T} u^*(x)-Q(x), \end{align} $ (47)

and after substituting (48) in (46),and using $\tilde{W}_u=W_u^*-\hat{W}_u$ and $\frac{\partial V^*}{\partial x}^{\rm T} g(x)=-2 u^*(x)^{\rm T}$,we have

$ \begin{align}\label{eq:t11} T_1&=-Q(x) +u^*(x)^{\rm T} u^*(x)-2 u^*(x)^{\rm T} \hat{W}_u^{\rm T} \phi_u(\hat{x}_j). \end{align} $ (48)

By using the Lipschitz condition from Assumption 1,we have

$ \begin{align}\label{eq:lipin22} &-2 u^*(x)^{\rm T} \hat{W}_u^{\rm T} \phi_u(\hat{x}_j)+u^*(x)^{\rm T} u^*(x)=\\&\quad \left\|u^*(x)-\hat{W}_u^{\rm T} \phi_u(\hat{x}_j)\right\|^2-\\ &\quad(\hat{W}_u^{\rm T} \phi_u(\hat{x}_j))^{\rm T} (\hat{W}_u^{\rm T} \phi_u(\hat{x}_j))\leq\\ &\quad L^2 \left\|e_j\right\|^2-\left\|\hat{W}_u^{\rm T} \phi_u(\hat{x}_j)\right\|^2. \end{align} $ (49)

By substituting (50) in (49) and setting $Q(x):=x^{\rm T} Q x$ with $Q\geq 0\in R^{n\times n}$ we have

$ \begin{align*} T_1&\leq-\underline{\lambda}\big(Q\big)\left\|x\right\|^2+ L^2 \left\|e_j\right\|^2-\left\|\hat{W}_u^{\rm T} \phi_u(\hat{x}_j)\right\|^2 \equiv\\& -\beta^2\underline{\lambda}\big(Q\big) \left\|x\right\|^2-(1-\beta^2)\underline{\lambda}\big(Q\big) \left\|x\right\|^2+ L^2 \left\|e_j\right\|^2-\\&\left\|\hat{W}_u^{\rm T} \phi_u(\hat{x}_j)\right\|^2\leq -\beta^2\underline{\lambda}\big(Q\big) \left\|x\right\|^2, \end{align*} $

after using (42).

Now for the term $T_2$ from (47) we have

$ \begin{align}\label{eq:t22} T_2&\leq-2 \alpha\underline{\lambda}\big(M\big)\left\|\tilde{W}_c\right\|^2+ \\&\frac{1}{2\alpha}\left\|\tilde{W}_c \right\|\omega_{\rm max} \epsilon_{\rm Hcmax}, \end{align} $ (50)

and after applying Young$'$s inequality at the second term of (51) one has

$ \begin{align*} T_2&\leq-\bigg(2 \alpha\underline{\lambda}\big(M\big)-\frac{1}{4\alpha}\bigg)\left\|\tilde{W}_c\right\|^2+\frac{1}{4\alpha} \omega_{\rm max} \epsilon_{\rm Hcmax}. \end{align*} $

Note that the second term that has known bounds will be dealt from the robustifying term $\eta$ as we shall see later (see (37)) and by using the inequality (43) we can guarantee asymptotic stability of the critic neural network estimation error.

Now we have to consider the jump dynamics given by (41) which are the sampled states and the actor neural network updates. For that reason we will consider the following difference Lyapunov function:

$ \begin{align*} \Delta\mathcal{V}(\psi)&=V^*(x^+)-V^*(x(r_j))+V^*(\hat{x}_j^+)-V^*(\hat{x}_j(r_j))+\\& V_c(\tilde{W}_c^+)-V_c(\tilde{W}_c(r_j))+\frac{\alpha_u^{-1}}{2} {\rm tr}\{{\tilde{W}_u^{+T}}{\tilde{W}_u^+}\}-\\& \frac{\alpha_u^{-1}}{2}{\rm tr}\{{\tilde{W}_u(r_j)}^{\rm T}{\tilde{W}_u(r_j)}\}, \end{align*} $

and since the states and the critic neural network estimation error are asymptotically stable,we have that $V^*(x^+)\leq V^*(x(r_j))$ and $V_c(\tilde{W}_c^+)\leq V_c(\tilde{W}_c(r_j))$. Also since we have proved that the states are asymptotically stable and since during the jump one has $x^+=\hat{x}_j^+$ we have that $V^*(\hat{x}_j^+)\leq V^*(\hat{x}_j(r_j))$. Then one can write $ \Delta {\mathcal{V}}(\hat{x}_j):=V^*(\hat{x}_j^+)- V^*(\hat{x}_j(r_j))\leq-k\big(\left\|\hat{x}_j\right\|\big)$ where $k$ is a class-$\mathcal{K}$ function[21].

Now we have to find a bound for the following term:

$ \begin{align*} \Delta\mathcal{V}&(\tilde{W}_u):= \frac{\alpha_u^{-1}}{2}{\tilde{W}_u^{+\rm T}} {\tilde{W}_u^+}-\frac{\alpha_u^{-1}}{2}{\tilde{W}_u(r_j)}^{\rm T}{\tilde{W}_u(r_j)}=\\ &\frac{\alpha_u^{-1}}{2}\bigg(\tilde{W}_u(r_j)-\alpha_u \phi_u(x(r_j))\bigg( \phi_u(x(r_j))^{\rm T} \tilde{W}_u(r_j)+\\ & \phi_u(x(r_j))^{\rm T} \epsilon_u+\frac{1}{2}\tilde{W}_c^{\rm T} \frac{\partial \phi(x(r_j))}{\partial x}g(x(r_j))+\\ &\frac{1}{2}\frac{\partial \epsilon_c}{\partial x}g(x(r_j))\bigg)\bigg)^{\rm T} \bigg(\tilde{W}_u(r_j)-\\ &\alpha_u \phi_u(x(r_j))\bigg( \phi_u(x(r_j))^{\rm T} \tilde{W}_u(r_j)+\\ & \phi_u(x(r_j))^{\rm T} \epsilon_u+\frac{1}{2}\tilde{W}_c^{\rm T} \frac{\partial \phi(x(r_j))}{\partial x}g(x(r_j))+\\ &\frac{1}{2}\frac{\partial \epsilon_c}{\partial x}g(x(r_j))\bigg)\bigg)-\frac{\alpha_u^{-1}}{2}\left\|\tilde{W}_u(r_j)\right\|^2=\\ &-\tilde{W}_u(r_j)^{\rm T}\phi_u(x(r_j))\bigg( \phi_u(x(r_j))^{\rm T} \tilde{W}_u(r_j)+\\ & \phi_u(x(r_j))^{\rm T} \epsilon_u+\frac{1}{2}\tilde{W}_c^{\rm T} \frac{\partial \phi(x(r_j))}{\partial x}g(x(r_j))+\\ &\frac{1}{2}\frac{\partial \epsilon_c}{\partial x}g(x(r_j))\bigg)+\\ &\frac{\alpha_u }{2}\bigg\|\phi_u(x(r_j))\bigg( \phi_u(x(r_j))^{\rm T} \tilde{W}_u(r_j)+\\ & \phi_u(x(r_j))^{\rm T} \epsilon_u+\frac{1}{2}\tilde{W}_c^{\rm T} \frac{\partial \phi(x(r_j))}{\partial x}g(x(r_j))+\\ &\frac{1}{2}\frac{\partial \epsilon_c}{\partial x}g(x(r_j))\bigg)\bigg\|^{2}=\\ &-\tilde{W}_u(r_j)^{\rm T}\phi_u(x(r_j)) \phi_u(x(r_j))^{\rm T} \tilde{W}_u(r_j)-\\ &\tilde{W}_u(r_j)^{\rm T}\phi_u(x(r_j))\phi_u(x(r_j))^{\rm T} \epsilon_u-\\ &\frac{1}{2}\tilde{W}_u(r_j)^{\rm T}\phi_u(x(r_j))\tilde{W}_c^{\rm T} \frac{\partial \phi(x(r_j))}{\partial x}g(x(r_j))-\\ &\tilde{W}_u(r_j)^{\rm T}\phi_u(x(r_j))\frac{1}{2}\frac{\partial \epsilon_c}{\partial x}g(x(r_j))+\\ &\frac{\alpha_u }{2}\bigg\|\phi_u(x(r_j))\bigg( \phi_u(x(r_j))^{\rm T} \tilde{W}_u(r_j)+\\ \end{align*} $
$ \begin{align*} & \phi_u(x(r_j))^{\rm T} \epsilon_u+\frac{1}{2}\tilde{W}_c^{\rm T} \frac{\partial \phi(x(r_j))}{\partial x}g(x(r_j))+\\ &\frac{1}{2}\frac{\partial \epsilon_c}{\partial x}g(x(r_j))\bigg)\bigg\|^{2}\leq\\ &-\left\|\tilde{W}_u(r_j)\right\|^2\phi_{u{\rm max}}^2 +\left\|\tilde{W}_u(r_j)\right\|^2+\frac{1}{2}\big(\phi_{u{\rm max}}^2 \epsilon_{u{\rm max}}\big)^2+\\ &\frac{1}{2}\big(\left\|\tilde{W}_c\right\|\phi_{u{\rm max}} \phi_{d{\rm max}}g_{\rm max}\big)^2+\\ &\frac{1}{2}\big(\phi_{u{\rm max}}\epsilon_{dc{\rm max}}g_{\rm max}\big)^2 +\frac{\alpha_u }{2}\phi_{u{\rm max}}^4 \left\|\tilde{W}_u(r_j)\right\|^2+\\ &\frac{\alpha_u }{2}\phi_{u{\rm max}}^4\epsilon_{u{\rm max}}^2+\\ &\frac{\alpha_u }{8}\phi_{u{\rm max}}^2\left\|\tilde{W}_c\right\|^2\phi_{d{\rm max}}^2 g_{\rm max}^2+\\ &\frac{\alpha_u }{8}\phi_{u{\rm max}}^2\epsilon_{dc{\rm max}}^2 g_{\rm max}^2+\\ &\frac{1}{2}\big(\alpha_u\phi_{u{\rm max}}^3\epsilon_{u{\rm max}}^2\big)^2+\frac{1}{2} \left\|\tilde{W}_u(r_j)\right\|^2+\\ &\frac{1}{2}\alpha_u\phi_{u{\rm max}}\phi_{d{\rm max}}g_{\rm max}\left\|\tilde{W}_c\right\|+\\ &\frac{1}{2}\alpha_u\phi_{u{\rm max}}\epsilon_{dc{\rm max}}g_{\rm max}+\\ &\frac{1}{2}\alpha_u\phi_{u{\rm max}}\epsilon_{u{\rm max}}\phi_{d{\rm max}}g_{\rm max}\left\|\tilde{W}_c\right\|+\\ &\frac{1}{4}\alpha_u\epsilon_{dc{\rm max}}\phi_{d{\rm max}}g_{\rm max}\left\|\tilde{W}_c\right\|, \end{align*} $

where we have used the Frobenius norm,Young$'$s inequality,and (34). By grouping the terms with $\left\|\tilde{W}_u(r_j)\right\|^2$ together we have

$ \begin{align*} &\Delta\mathcal{V}(\tilde{W}_u)\leq-\bigg(\phi_{u{\rm max}}^2-\frac{3}{2}-\frac{\alpha_u }{2}\phi_{u{\rm max}}^4 \bigg)\left\|\tilde{W}_u(r_j)\right\|^2+\mu, \end{align*} $

where $\mu$ is the known bound defined in (38) since we have proved that the critic neural network estimation error is asymptotically stable. Now we can either prove uniform ultimate boundedness (UUB)[21] as long as the sampled state satisfies, $k\big(\left\|\hat{x}_j\right\|\big)>\mu$ or the actor estimation error $\left\|\tilde{W}_u\right\|>\sqrt{\frac{\mu}{\bigg(\phi_{u{\rm max}}^2-\frac{3}{2}-\frac{\alpha_u }{2}\phi_{u{\rm max}}^4 \bigg)}}$ or we can put the known bounds into the robustifying term[20] and prove asymptotic stability as long as (44) is true. The result holds as long as we can show that the state $x(t)$ remains in the set $\Omega\subseteq R^n$ for all times. To this effect,define the following compact set:

$ \begin{align*} M:= \big\{x\in R^n|\mathcal{V}(t)\leq m)\big\}\subset{\bf R}^n, \end{align*} $

where $m$ is chosen as the largest constant so that $M\subseteq \Omega$. Since by assumption $x_0\in\Omega_x$,and $\Omega_x\subset\Omega$ then we can conclude that $x_0\in\Omega$. While $x(t)$ remains inside $\Omega$,we have seen that $\dot{\mathcal{V}}\le 0$ and therefore $x(t)$ must remain inside $M\subset\Omega$. The fact that $x(t)$ remains inside a compact set also excludes the possibility of finite escape time and therefore one has global existence of solution.

Ⅴ.SIMULATIONS

To support our theoretical developments and to show the advantages of an event-triggered optimal adaptive controller with respect to a time-triggered optimal adaptive controller as proposed in [22, 23],two simulation examples are presented,one for an F16 aircraft plant and one for a nonlinear system,namely a Van-der Pol Oscillator. In both examples,the tuning gains are picked as $\alpha=20$ and $\alpha_u=0.2$ and $\beta=0.7$.

A. Aircraft Plant

Consider the F16 aircraft plant used in [24],

$ \begin{align*} \dot{x}=\left[\begin{array}{*{20}c}-1.01887& 0.90506 & -0.00215\\0.8225&-1.07741 &-0.17555\\0 & 0& -1\end{array}\right]x+\left[\begin{array}{*{20}c}0\\0\\1\end{array}\right] u, \end{align*} $

where $Q,\ R$ are identity matrices of appropriate dimensions. If one solves the Riccati equation offline we will have, $P=\left[\begin{array}{*{20}c}1.4245 & 1.1682&-0.1352\\ 1.1682& 1.4349&-0.1501\\-0.1352&-0.1501& 0.4329\end{array}\right]$. The critic NN activation basis function $\phi(x)$ are picked as quadratic in the state and the actor NN activation functions are picked as $\phi_u(x)\equiv\frac{\phi(x)}{\partial x}$. The critic neural network of the optimal adaptive event-triggered algorithm presented in Theorem 3 converges to,$\hat{W}=[1.4259 ~~ 1.1713 ~~-0.1391 ~~ 1.4412\\-0.1498 ~~0.4346]^{\rm T}$. The evolution of the system time-triggered state versus the event sampled state is presented in Fig. 3. The event-triggered control input versus the time-triggered control input is presented in Fig. 4 and the evolution of the sampled-states that are used by the event-triggered controller are shown in Fig. 5. In Fig. 6,one can see that the event-trigger threshold converges to zero as the states converge to zero. A comparison between the event-triggered controller state measurements and the time-triggered controller is shown in Fig. 7 where the event-triggered controller uses less than 80 samples of the state as opposed to the time-triggered controller that uses more than 700 samples.

Download:
Fig. 3 Evolution of the system states.

Download:
Fig. 4 Event-triggered control input vs. time-triggered control input.

Download:
Fig. 5 Sampled states used by the event-triggered controller.

Download:
Fig. 6 Event-triggered threshold and triggering instants.

Download:
Fig. 7 Number of state samples used.
B. Van-der Pol Oscillator

Consider the following affine in control input Van-der Pol oscillator with a quadratic cost with $Q$ the identity matrix of appropriate dimensions and $R=0.1$ given by

$ \begin{align*} \dot{x}=\left[\begin{array}{*{20}c}x_2\\-x_1+\frac{1}{2} x_2(1-x_1^2)-x_1^2 x_2\end{array}\right]+\left[\begin{array}{*{20}c}0\\x_1\end{array}\right]u, \end{align*} $

where $x:=\left[\begin{array}{*{20}c}x_1\\x_2\end{array}\right]$. The optimal cost for that system is given in [25] as $V^*=x_1^2+x_2^2$ and the optimal controller as $u^*=-0.5 x_1x_2$. By employing the algorithm presented in Theorem 3 with quadratic basis functions for both the critic and the actor,the critic neural network converged to $\hat{W}=\left[\begin{array}{*{20}c}1.0001~~ 0.0002~~ 1.0000\end{array}\right]$ and the actor neural network to $\hat{W}_u=\left[\begin{array}{*{20}c}0.0001& 0.9998& 0.0000\end{array}\right]$ which are the optimal values given above. The evolution of the system states is presented in Fig. 8. The event-triggered control input versus the time-triggered control input is presented in Fig. 9 and the evolution of the sampled-states that the controller uses are shown in Fig. 10.

Download:
Fig. 8 Evolution of the system states.

Download:
Fig. 9 Event-triggered control input vs. time-triggered control input.

Download:
Fig. 10 Sampled states used by the event-triggered controller.

Download:
Fig. 11 Event-triggered threshold and triggering instants.

In Fig. 11,one can see that the event-trigger threshold converges to zero as the states converge to zero. A comparison between the event-triggered controller state measurements and the time-triggered controller is shown in Fig. 12,where the event-triggered controller uses 83 samples of the state as opposed to the time-triggered controller that uses almost 8 000 samples,which is a great improvement in applications with bandwidth limitations and shared resources. Finally the inter-event times as a function of the time is shown in Fig. 13,where one can see the significant improvement with respect to state transmissions.

Download:
Fig. 12 Number of state samples used.

Download:
Fig. 13 Inter-event times as a function of time.
Ⅵ.CONCLUSION AND FUTURE WORK

This paper has proposed an optimal adaptive event-triggered control algorithm for nonlinear systems. The online algorithm is implemented based on an actor/critic neural network structure. A critic neural network is used to approximate the cost and an actor neural network is used to approximate the optimal event-triggered controller. Since in the algorithm proposed there are dynamics that exhibit continuous evolutions described by ordinary differential equations and instantaneous jumps or impulses,we used an impulsive system approach. Simulation results of an F16 aircraft plant and a controlled Van-der Pol oscillator show the effectiveness and efficiency of the proposed approach in terms of control bandwidth and performance. Future work will be concentrated on extending the results in completely unknown systems and multiple decision makers.

References
[1] Heemels W P M H, Donkers M C F, Teel A R. Periodic event-triggered control for linear systems. IEEE Transactions on Automatic Control, 2013, 58(4): 847-861
[2] Lemmon M D. Event-triggered feedback in control, estimation, and optimization,. Networked Control Systems vol. 405: Lecture Notes in Control and Information Sciences. Heidelberg: Springer-Verlag, 2010. 293-358
[3] Tabuada P. Event-triggered real-time scheduling of stabilizing control tasks. IEEE Transactions on Automatic Control, 2007, 52(9): 1680-1685
[4] Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Systems, 2012, 32(6): 76-105
[5] Haddad W M, Chellaboina V, Nersesov S G. Impulsive and Hybrid Dynamical Systems: Stability, Dissipativity, and Control. Princeton, NJ: Princeton University Press, 2006.
[6] Hespanha J P, Liberzon D, Teel A R. Lyapunov conditions for inputto-state stability of impulsive systems. Automatica, 2008, 44(11): 2735-2744
[7] Naghshtabrizi P, Hespanha J P, Teel A R. Exponential stability of impulsive systems with application to uncertain sampled-data systems. Systems and Control Letters, 2008, 57(5): 378-385
[8] Donkers M C F, Heemels W P M H. Output-based event-triggered control with guaranteed L1-gain and improved and decentralised event-triggering. IEEE Transactions on Automatic Control, 2012, 57(6): 1362-1376
[9] Wang X, Lemmon M D. On event design in event-triggered feedback systems. Automatica, 2012, 47(10): 2319-2322
[10] Demir O, Lunze J. Cooperative control of multi-agent systems with event-based communication. In: Proceedings of the 2012 American Control Conference. Montreal, QC: IEEE, 2012. 4504-4509
[11] Lehman D, Lunze J. Event-based control with communication delays. In: Proceedings of the 2011 IFAC World Congress. Milano, Italy: IFAC, 2011.3262-3267
[12] Molin A, Hirche S. Suboptimal event-based control of linear systems over lossy channels estimation and control of networked systems. In: Proceedings of the 2nd IFAC Workshop on Distributed Estimation and Control in Networked Systems. Palace, France: IFAC, 2010. 55-60
[13] Garcia E, Antsaklis P J. Model-based event-triggered control with timevarying network delays. In: Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference. Orlando, FL: IEEE, 2011. 1650-1655
[14] Lewis F L, Syrmos V L. Optimal Control. New York: John Wiley, 1995.
[15] Barradas B, de Jesús J, Gommans T M P, Heemels M. Self-triggered MPC for constrained linear systems and quadratic costs. In: Proceedings of the 2012 IFAC Conference on Nonlinear Model Predictive Control. Leeuwenhorst, Netherlands: IFAC, 2012.342-348
[16] Ioannou P A, Fidan B. Adaptive Control Tutorial. Advances in design and control, SIAM (PA), 2006.
[17] Loría A. Explicit convergence rates for MRAC-type systems. Automatica, 2004, 40(8): 1465-1468
[18] Loría A, Panteley E. Uniform exponential stability of linear time-varying systems: revisited. Systems and Control Letters, 2003, 47(1): 13-24
[19] Panteley E, Loria A, Teel A. Relaxed persistency of excitation for uniform asymptotic stability. IEEE Transactions on Automatic Control, 2001, 46(12): 1874-1886
[20] Polycarpou M, Farrell J, Sharma M. On-line approximation control of uncertain nonlinear systems: issues with control input saturation. In: Proceedings of the 2003 American Control Conference. Denver, CO: IEEE, 2003: 543-548
[21] Khalil H K. Nonlinear Systems. New Jersey: Prentice Hall, 2002.
[22] Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878-888
[23] Vrabie D, Vamvoudakis K G, Lewis F L. Optimal adaptive control and differential games by reinforcement learning principles. Control Engineering Series. New York: IET Press, 2012.
[24] Stevens B, Lewis F L. Aircraft Control and Simulation (Second Edition). New Jersey: John Willey, 2003.
[25] Nevistić V, Primbs J A. Constrained Nonlinear Optimal Control: A Converse HJB Approach, Technical Memorandum No. CIT-CDS 96-021, 1996.