IEEE/CAA Journal of Automatica Sinica  2015, Vol.2 Issue (1): 74-84   PDF    
Dynamic Multi-team Antagonistic Games Model with Incomplete Information and Its Application to Multi-UAV
Wenzhong Zha, Jie Chen , Zhihong Peng    
1. School of Automation, Beijing Institute of Technology, Beijing 100081, China;
2. State Key Laboratory of Intelligent Control and Decision of Complex Systems, Beijing 100081, China
Abstract: At present, the studies on multi-team antagonistic games (MTAGs) are still in the early stage, because this complicated problem involves not only incompleteness of information and conflict of interests, but also selection of antagonistic targets. Therefore, based on the previous researches, a new framework is proposed in this paper, which is dynamic multi-team antagonistic games with incomplete information (DMTAGII) model. For this model, the corresponding concept of perfect Bayesian Nash equilibrium (PBNE) is established and the existence of PBNE is also proved. Besides, an interactive iteration algorithm is introduced according to the idea of the best response for solving the equilibrium. Then, the scenario of multiple unmanned aerial vehicles (UAVs) against multiple military targets is studied to solve the problems of tactical decision making based on the DMTAGII model. In the process of modeling, the specific expressions of strategy, status and payoff functions of the games are considered, and the strategy is coded to match the structure of genetic algorithm so that the PBNE can be solved by combining the genetic algorithm and the interactive iteration algorithm. Finally, through the simulation the feasibility and effectiveness of the DMTAGII model are verified. Meanwhile, the calculated equilibrium strategies are also found to be realistic, which can provide certain references for improving the autonomous ability of UAV systems.
Key words: Dynamic multi-team antagonistic games (DMTAGs)     incomplete information     perfect Bayesian Nash equilibrium (PBNE)     multi-UAV cooperation     tactical decision making    
Ⅰ. INTRODUCTION

There are many scenarios of cooperation in reality, for example,in a football match 11 players cooperate with each other,striving to get the ball into the opponent$'$s goal,in a military confrontation the combat units operate jointly to fight against the enemy targets. A common characteristic of these scenarios is that people cooperate or compete with others in teams. Team is a loose collection of multiple members or agents,and the members or agents have a certain objective in common. Generally,the presence of a common objective forges a team and induces cooperative behavior. However,the team members might not be entirely altruistic. On the contrary,they might be selfish and have individual objectives. The additional individual objectives of these team members probably encourage them to opt for a weak degree of non-cooperation,mild competition,or adversarial action[1]. When considering the interactive behaviors between teams,the existence of conflict of interests will cause an outright competition between the members of different teams. So,how should the team members make the best decisions to maximize their common objective and individual objectives in the cases of internal cooperation and external competition? This is a game process,and we call it multi-team antagonistic games (MTAGs).

At present,the game theory related to team is mainly represented by cooperative game[2, 3] and evolutionary game[4]. In cooperative game,coalition can be regarded as a small team where the players need to decide whether or not to enter. There must be a binding agreement in the game to distribute the cooperation benefits. While in evolutionary game,population can be deemed to form a large team which contains a large number of small agents capable of making independent decisions. These small agents will take the strategic interaction repeatedly in the process of evolution (learning,imitating and mutating) to reach the equilibrium in population. But there are great distinctions between the above two theories on MTAG because of the non-cooperation and cooperation as well as the number of members in MTAG.

In 1997,Stengel and Koller firstly investigated zero-sum game where a team of several players confronted a single adversary[5]. It may be the embryo of MTAG. Then,Liu and Simaan[6] introduced convex static multi-team games and proposed an important concept of noninferior Nash strategy (NNS) which is Pareto optimal if the players belong to the same team and Nash optimal if they belong to different teams. Thence,multi-team games began to attract more and more attention of researchers,such as Ahmed[7], Elettreby[8] and Asker[9]. They generalized the multi-team games to Cournot game to study the dynamics and asymptotic stability of the equilibrium solution of the games. As a whole,we can find that the above research results were mainly about complete information game whose applications also only involved the problems of enterprise competition in economic field,since these problems had convex payoff which could conform with the assumption of NNS. However,the realistic situation is not so perfect because of the pervasive incomplete information[10] and nonconvex payoff functions,thus the applications of multi-team games have to be limited. Taking these factors into account,we turn to establish a new game model which involves incomplete information,dynamic and antagonistic environment. We call it dynamic multi-team antagonistic games with incomplete information (DMTAGII).

As the scenarios described in the first paragraph,MTAGs can be applied in many realistic problems,especially the multi-agent systems. A typical example is the tactical decision making for multiple unmanned aerial vehicles (UAVs) against multiple military targets. In the last decade,the use of UAVs for various military missions has received an increasing attention. Compared to a single UAV,the air formation composed of multiple UAVs has more advantages. For example,it can accomplish a variety of military missions,attack the targets continuously and get more information about the threats and the battlefield situation[11, 12]. In such a case,an important question is how the UAVs make the best decisions autonomously and cooperate with others to confront the enemy collectively,especially when the enemy also has a certain intelligence. This is a complex problem because of coupling of missions and uncertainty of battlefield information. Generally, the decisions of multiple UAVs mainly include tactical decision[13, 14] and maneuver decision[15]. Tactical decision involves offensive or defensive behavior of UAVs and is carried out in discrete time,while maneuver decision is a continuous process as it refers to the mobile behavior such as pursuit or evasion. For the tactical decision,differing from the general dynamic game model with incomplete information or static game model with complete information used in [13, 14],we will use the newly established DMTAGII model to solve the problems in this paper.

In DMTAGII,the main feature is the interactive behavior in a team, which comes from information sharing and coordination of the team members$'$ strategies. Generally,the relationship between members within a team is cooperative. However,the conflict of interests will exist in the process of pursuing the maximization of different payoffs by the members simultaneously. This is obviously a non-cooperative or competitive scenario. The classical Nash non-cooperative game (NNG) can be considered to resolve the conflict. An important concept of NNG is the Nash equilibrium[16] (a profile of strategies such that each player$'$s strategy is an optimal response to the other players$'$ strategies). For NNG model,the strategy is pure (one certain type of strategy adopted by a player from its strategy space) or mixed (combination of pure strategies by probability distributing). It is well known that not all NNGs have pure strategy Nash equilibriums, while the strategy of DMTAGII is just pure,so we cannot directly use the concept of Nash equilibrium to process the aforementioned non-cooperative scenario. Instead,we turn to build an integrated model by weighting the objective functions of team members to ensure the interests of some important members. Then we will introduce the perfect Bayesian Nash equilibrium (PBNE) for DMTAGII and prove the existence of PBNE.

To solve the game model,the general method is converting the game into the matter of linear programming[17, 18],while it is invalid in the DMTAGII model as the payoffs of team members are uncertain (they can only observe the actions of other team members and choose the optimal strategies to antagonize) and the payoff matrix cannot be built. Thus,we propose an interactive iteration algorithm according to the idea of the best response. The above concepts are all organized in Section II of this paper. Then,in Section III,we build the tactical decision making model for multi-UAV against multiple military targets based on the DMTAGII model. Here,the specific expressions of strategy,status and payoff functions of the team members are mainly discussed. Finally,in Section IV,we verify the feasibility and effectiveness of the models proposed in this paper by introducing a simulation example, and make the conclusions in Section V.

Ⅱ. DMTAGII A Concepts of the Game

Without loss of generality,the DMTAGII of only two teams will be discussed in this paper. The key concepts of DMTAGII are described as following.

1) Player. Suppose the players of the game are team members,where they can be denoted as $X=\{ x_1,x_2,\cdots,x_n \}$ and $Y=\{ y_1,y_2,\cdots,y_m \}$.

2) Type. The sets of types of $x_i$ and $y_j$ are $\Theta _{xi} \subseteq \Theta_X$,$i=1,2,\cdots,n$ and $\Theta _{yj}\subseteq \Theta_Y$,$j=1,2,\cdots,m$. In the following context,we always take $i,n$ and $j,m$ as the subscripts of team $X$ and $Y$,respectively. $\Theta _{xi}$ are known for every member,but at the $k$-th stage,the specific type $\theta _{xi}^k$ of $x_i$ is its private information which cannot be known by the members from other team.

3) Action. The sets of actions of $x_i$ and $y_j$ are $A_{xi} \subseteq A_X$ and $A_{yj} \subseteq A_Y$. $A_{xi}$ is also well known information while the specific action $a_{xi}^k$ of $x_i$ can only be observed by $y_j$ at the $k$-th stage. We take a combination of specific actions of every member $y_j$ as the observed information by member $x_i$,${\pmb a}_{-X}^k=[a_{y1}^k, a_{y2}^k,\cdots a_{ym}^k]$,which is shared information in team $X$.

4) Strategy. It is different from the general dynamic game model with incomplete information,since the strategies of members (players) are related to their types,actions and the antagonistic targets (could be the members or the types of members on the opponent side) in DMTAGII. That is the main reason of choosing ``antagonistic'' rather than ``non-cooperative'' to describe the games. Suppose the sets of strategies of $x_i$ and $y_j$ are $S_{xi}\subseteq S_X$ and $S_{yj}\subseteq S_Y$,then the specific strategies of $x_i$ and $y_j$ are ${\pmb s}_{xi}^k=[\theta_{xi}^k, a_{xi}^k,T_Y]$ and ${\pmb s}_{yj}^k=[\theta_{yj}^k,a_{yj}^k, T_X]$ at the $k$-th stage,where $T_Y$ and $T_X$ are the antagonistic targets of team $X$ and $Y$,respectively.

5) Status. At the $k$-th stage,every member can infer the current status of the game (such as the inventories and limits of manpower,resources,energy,etc.) according to the strategies of all the team members from stage $1$ to stage $k-1$. We use $E_{xi}^k=E({\pmb s}_{xi}^1,{\pmb s}_{xi}^2,\cdots,{\pmb s}_{xi}^{k-1})$ and $E_{yj}^k=E({\pmb s}_{yj}^1,{\pmb s}_{yj}^2, \cdots,{\pmb s}_{yj}^{k-1})$ to denote the statuses of $x_i$ and $y_j$. Accordingly,$E_X^k$ and $E_Y^k$ are the statuses of team $X$ and $Y$,$E^k=\bigl[E_X^k,E_Y^k\bigr]$ is the status of the game and $E=\{E^1,E^2,\cdots,E^K\}$ is the status of the whole game process,where $K$ is the terminal node of the game.

6) Belief. Belief is the knowledge that one team knows about other teams,and it will be revised as the game progresses. The method to revise belief is Bayes rule. Without loss of generality,we take into account the belief of team $X$ to team $Y$. Suppose the members of team $X$ believe that the prior probability of the members of team $Y$ belong to type combination ${\pmb \theta}_{-X}=[\theta_{y1},\theta_{y2},\cdots,\theta_{ym}]$ is

$ P({\pmb \theta}_{-X})=\bigl[p(\theta_{y1}),p(\theta_{y2}), \cdots,p(\theta_{ym})\bigr]. $ (1)

Furthermore,from the knowledge of the members of team $X$,given the type combination ${\pmb \theta}_{-X}$ of the members of team $Y$,then the conditional probability combination of their choosing action combination ${\pmb a}_{-X}$ is

$ P({\pmb a}_{-X}\mid{\pmb \theta}_{-X})=\bigl[ p(a_{y1}\mid\theta_{y1}),\cdots,p(a_{ym}\mid\theta_{ym})\bigr]. $ (2)

Then,according to Bayes rule,when the members of team $X$ observe the action combination of the members of team $Y$ is ${\pmb a}_{-X}^k$ at the $k$-th stage,they will believe that the posterior probability combination of the members of team $Y$ belonging to ${\pmb \theta}_{-X}$ is

$ P({\pmb \theta}_{-X}\mid {\pmb a}_{-X}^k)=\bigl[p(\theta_{y1}\mid a_{y1}^k),\cdots,p(\theta_{ym}\mid a_{ym}^k)\bigr],$ (3)

where

$ p(\theta_{yj}\mid a_{yj}^k)=\frac{p(a_{yj}^k\mid\theta_{yj}) p(\theta_{yj})}{\sum_{r}{p(a_{yj}^k\mid\theta_{yj,r}) p(\theta_{yj,r})}},\quad \theta_{yj,r}\in\Theta_{yj}. $ (4)

The reason of choosing ``$-X$'' as the subscript of team $Y$ in the above descriptions is that the belief is a subjective understanding of team $X$ to team $Y$.

7) Payoff. In DMTAGII,due to the fact that team members need to cooperate with others in the same team and confront with the members of other teams,their payoffs are necessarily affected by the members of all teams. The influences are mainly reflected by the strategies and statuses of the members,thus we can define the payoff functions of $x_i$ and $y_j$ are $u_{xi}({\pmb s}_X,{\pmb s}_Y,E)$ and $u_{yj}({\pmb s}_X,{\pmb s}_Y,E)$,where ${\pmb s}_X \in S_X$ and ${\pmb s}_Y\in S_Y$ are the sets of strategies of all members $x_i$ and $y_j$,respectively. Accordingly,the payoff functions of team $X$ and $Y$ can be defined as $u_X({\pmb s}_X,{\pmb s}_Y,E)$ and $u_Y({\pmb s}_X,{\pmb s}_Y,E)$, respectively.

8) Equilibrium. For DMTAGII,from the forms of general dynamic game with incomplete information[10],we define the equilibrium temporarily as follows (the normal definition will be introduced in Section II-C).

Definition 1. PBNE is a strategy combination ${\pmb s}^\ast=[{\pmb s}_X^\ast,{\pmb s}_Y^\ast]=[{\pmb s}_{x1}^\ast, \cdots,{\pmb s}_{xn}^\ast,{\pmb s}_{y1}^\ast,\cdots,{\pmb s}_{ym}^\ast]$ and a posterior probability combination $[P({\pmb \theta}_{-X}\mid {\pmb a}_{-X}^k),P({\pmb \theta}_{-Y}\mid {\pmb a}_{-Y}^k)]$,which need to meet the following conditions.

1) Perfectness condition. For every team member and on every information set at the $k$-th stage,we have

$ {\pmb s}_{xi}^\ast\in \arg \max_{{\pmb s}_{xi}}P({\pmb \theta}_{-X}\mid {\pmb a}_{-X}^k)u_{xi}({\pmb s}_X,{\pmb s}_Y,E),$ (5)
$ {\pmb s}_{yj}^\ast\in \arg \max_{{\pmb s}_{yj}}P({\pmb \theta}_{-Y}\mid {\pmb a}_{-Y}^k)u_{yj}({\pmb s}_X,{\pmb s}_Y, E). $ (6)

2) Bayes rule. $[P({\pmb \theta}_{-X}\mid {\pmb a}_{-X}^k),P({\pmb \theta}_{-Y}\mid {\pmb a}_{-Y}^k)]$ is obtained by using Bayes rule to infer based on the prior probability,observed actions $[{\pmb a}_{-X}^k,{\pmb a}_{-Y}^k]$ and optimal strategies $[{\pmb s}_Y^\ast,{\pmb s}_X^\ast]$.

Remark 1. The information set is the information about all the team members at the $k$-th stage,including certain information (such as actions,etc.) and belief obtained by Bayes rule. In every information set,when it comes to team $X$,the member $x_i$ must have the belief about the probability of the game coming to every node.

Remark 2. From the perfectness condition,given the belief, the strategies of team members must meet the ``sequentially rational'' requirement,i.e.,given the strategies of the members of team $Y$,${\pmb s}_Y=[{\pmb s}_{y1},{\pmb s}_{y2},\cdots, {\pmb s}_{ym}]$,and the posterior probability combination $P({\pmb \theta}_{-X}\mid {\pmb a}_{-X}^k)$ believed by the members of team $X$,then the strategy of team member $x_i$ is optimal on the continuation game from information set $k$. The ``continuation game'' is a complete plan about how to cope with all possible cases after the $k$-th stage .

Actually,the main similarities between the DMTAGII mode and general mode about dynamic game with incomplete information are the incomplete information and dynamic property. Obviously,when DMTAGII is a finite game and the number of selectable strategies is also finite,there is at least one PBNE in dynamic game with incomplete information under the DMTAGII mode. The proof can be consulted from Harsanyi$'$s paper[10].

On the other hand,the main differences between the DMTAGII mode and general mode are that the specific strategies of the members are related to not only the types and actions,but also the antagonistic targets on the opponent side. Besides,as the interactive relationship in the same team,members need to consider both individual interests and team interest.

The interactive behaviors make us to rethink the definition of PBNE. How do the team members choose the optimal strategies when there are multiple equilibrium solutions? Actually,there is a definite viewpoint in DMTAGII,which is the existence of team interest (or common objective). Without team interest,forming team is meaningless. We take formulas (5) and (6) as the objective functions of all team members:

$ J_{xi}({\pmb s}_X,{\pmb s}_Y,E)=P({\pmb \theta}_{-X}\mid {\pmb a}_{-X}^k)u_{xi}({\pmb s}_X,{\pmb s}_Y,E),$ (7)
$ J_{yj}({\pmb s}_X,{\pmb s}_Y,E)=P({\pmb \theta}_{-Y}\mid {\pmb a}_{-Y}^k)u_{yj}({\pmb s}_X,{\pmb s}_Y,E). $ (8)

Accordingly,the objective functions of team $X$ and $Y$ have the similar forms:

$ J_{X}({\pmb s}_X,{\pmb s}_Y,E)=P({\pmb \theta}_{-X}\mid {\pmb a}_{-X}^k)u_{X}({\pmb s}_X,{\pmb s}_Y,E),$ (9)
$ J_{Y}({\pmb s}_X,{\pmb s}_Y,E)=P({\pmb \theta}_{-Y}\mid {\pmb a}_{-Y}^k)u_{Y}({\pmb s}_X,{\pmb s}_Y,E). $ (10)

Thus,it is very meaningful to discuss the team interaction, because it refers to the coordination,between individual interests and team interest.

B Interactive Behaviors in the Team

1)Understanding of interactive behaviors: Generally,in DMTAGII the interactive behaviors mainly come from two aspects.

Firstly,one aspect is information sharing,where the information contains the observed action combination and belief. This is communication and fusion about information,and we call it ``soft-interaction'',because it impacts the objective functions of team members only by Bayes rule rather than the direct payoff functions.

Secondly,interactive behaviors also come from the coordination of team members$'$ strategies to achieve the minimum overall cost and maximum income of the whole team. Correspondingly,we call it ``hard-interaction''. In general,there are three different degrees of hard-interaction[1].

1) Team coordination. The main distinction of team coordination is that team members do not have individual objective functions. There is only one common objective,which all team members strive to optimize. Thus,the team members maybe overutilize or underutilize the team resources.

2) Team cooperation. In team cooperation,each of the team members has a private objective in addition to the common objective. A general method is weighting the private and common objective functions by coefficient $w$ ($0\le w\le 1$). The decrease of $w$ on the private objective functions corresponds to the increase of cooperation level among the team members. There is a possibility of conflict of interests,but due to the structure of the objective functions used,the conflict is not generally dominant. Even if they are in conflict,the team objective can also take precedence according to its higher weight than that of private objective functions.

3) Team collaboration. Team collaboration is a loose form of team interaction,where the team focuses on task completion and feasibility. On this premise,each team member will try to maximize its local objective function while avoiding conflict of interests and task redundancy. This requires that a protocol should be designed for negotiation to arbitrate conflicts.

Actually,the objective functions of the members from the same team are different in DMTAGII. Thus,when it is not possible or feasible that the structured joint action simultaneously optimizes the different objective functions,the conflict of interests is brought about.

2) Integrated model in cooperation Now the question is how to define the objective functions for each team member such that the extremely selfish behavior can be avoided even if in the non-cooperative scenario? By comparing the three categories of interactive mode mentioned above,the one easier to implement and more realistic in DMTAGII is ``team cooperation''. Thus,we redefine the objective functions of team members as follows:

$ \begin{align} & {{H}_{xi}}({{s}_{X}},{{s}_{Y}},E)={{w}_{xi}}{{J}_{xi}}({{s}_{X}},{{s}_{Y}},E)+(1-{{w}_{xi}}) \\ & \ \ \ \ \ \ \ \ \ \ \ {{J}_{X}}({{s}_{X}},{{s}_{Y}},E),\\ \end{align} $ (11)
$ \begin{align} & {{H}_{yj}}({{s}_{X}},{{s}_{Y}},E)={{w}_{yj}}{{J}_{yj}}({{s}_{X}},{{s}_{Y}},E)+(1-{{w}_{yj}})\cdot \\ & {{J}_{Y}}({{s}_{X}},{{s}_{Y}},E),\\ \end{align} $ (12)

where $0 < w_{xi},w_{yj} < 1$.

Then,we continue to discuss the cooperation within the team. When fixing the variables ${\pmb s}_Y$ and $E$,a straightforward idea is to find a set of strategies ${\pmb s}_X^\ast=[{\pmb s}_{x1}^\ast,{\pmb s}_{x2}^\ast,\cdots,{\pmb s}_{xn}^\ast]$ such that for any member $x_i$,$H_{xi}({\pmb s}_X^\ast,{\pmb s}_Y,E)$ is optimal. This is the concept of Nash equilibrium. Now the question comes again. Does the Nash equilibrium exist? If ${\pmb s}_X^\ast$ is a finite set of mixed strategies,then there is obviously at least one Nash equilibrium. However,unfortunately the discussed strategies in our DMTAGII may be pure and in the case that there may not exist a Nash equilibrium. Note that it is different from the preceding PBNE. In PBNE,the existence of equilibrium largely depends on Bayes rule,while within the team, Bayes rule has no effect,because the posterior probability or belief is the same for all members.

So it is necessary to change our original goal. If it is not possible to find a set of pure strategies such that the objective functions of all members within the team can achieve optimum simultaneously,then it is always feasible to optimize the weighted sum of these objective functions. It has the basis in reality that we might not consider the interest of every member simultaneously but at least keep the interests of important members as priority by weighting. Thus,we have the definition as following.

Definition 2. Suppose $\rho_{xi}$ is the weight of the objective function of member $x_i$ in team $X$,where $0 < \rho_{xi} < 1$ and $\sum_{i=1}^{n}{\rho_{xi}}=1$ (The assumption ensures that every member has the right to participate in the distribution of team interest). Then we can define an objective function,called ``integrated objective function'' of team $X$, i.e.,

$ H_{X}({\pmb s}_X,{\pmb s}_Y, E)=\sum_{i=1}^{n}{\rho_{xi}H_{xi}({\pmb s}_X,{\pmb s}_Y, E)}=\nonumber\\ \quad \sum_{i=1}^{n}\rho_{xi}w_{xi}J_{xi}({\pmb s}_X,{\pmb s}_Y, E)+\nonumber\\ \quad \sum_{i=1}^{n}\rho_{xi}(1-w_{xi})J_{X}({\pmb s}_X,{\pmb s}_Y,E). $ (13)

Let $\rho_{xi}w_{xi}=\xi_{xi}$ and $\sum_{i=1}^{n}\rho_{xi}(1-w_{xi})=\eta_{xi}$. It is obvious that $0<\xi_{xi}<1$,and we can let $\sum_{i=1}^{n}\xi_{xi}=1$. $\xi_{xi}$ can be given in advance. However,the range of $\eta_{xi}$ is $(0,\infty)$. To control it,we let $\eta_{xi}=\frac{1}{n}$ in this paper,then the final form of the integrated objective function of team $X$ is

$ H_X({\pmb s}_X,{\pmb s}_Y,E)=\sum_{i=1}^{n}\xi_{xi}J_{xi}({\pmb s}_X,{\pmb s}_Y,E)+\frac{1}{n}J_X({\pmb s}_X,{\pmb s}_Y,E). $ (14)

Similarly,the integrated objective function of team $Y$ can also be defined as

$ H_Y({\pmb s}_X,{\pmb s}_Y, E)\!=\!\!\sum_{j=1}^{m}\xi_{yj}J_{yj}({\pmb s}_X,{\pmb s}_Y, E)+\frac{1}{m}J_Y({\pmb s}_X,{\pmb s}_Y,E). $ (15)
C. PBNE

Let us come back to the equilibrium problem. Now,the perfectness condition in the preceding temporary PBNE has been changed and we have the following theorem.

Theorem 1. Suppose every team member has a finite number of strategies in DMTAGII,then there is at least one PBNE in the game

.

Proof. Only the descriptions of the strategy and payoff function are different between the DMTAGII mode and general mode about dynamic game with incomplete information,while the structure of game is unchanged. So we can take a team as a virtual player to integrate into the general structure,i.e.,

$ {{U}_{X}}({{s}_{X}},{{s}_{Y}},E)=\sum\limits_{i=1}^{n}{{{\xi }_{xi}}}{{u}_{xi}}({{s}_{X}},{{s}_{Y}},E)+\frac{1}{n}{{u}_{X}}({{s}_{X}},{{s}_{Y}},E),$ (16)
$ {{U}_{Y}}({{s}_{X}},{{s}_{Y}},E)=\sum\limits_{j=1}^{m}{{{\xi }_{yj}}}{{u}_{yj}}({{s}_{X}},{{s}_{Y}},E)+\frac{1}{m}{{u}_{Y}}({{s}_{X}},{{s}_{Y}},E). $ (17)

Then,the normal description of PBNE of DMTAGII is as follows.

PBNE of DMTAGII is a strategy combination ${\pmb s}^\ast=[{\pmb s}_X^\ast,{\pmb s}_Y^\ast]$ and a posterior probability combination $[P({\pmb \theta}_{-X}\mid {\pmb a}_{-X}^k),P({\pmb \theta}_{-Y}\mid {\pmb a}_{-Y}^k)]$,which needs to meet the following conditions.

1) Perfectness condition. For every team and on every information set at the $k$-th stage,we have

$ s_{X}^{*}\in \arg {{\max }_{{{s}_{X}}}}P({{\theta }_{-X}}\mid a_{-X}^{k}){{U}_{X}}({{s}_{X}},{{s}_{Y}},E),$ (18)
$ s_{Y}^{*}\in \arg {{\max }_{{{s}_{Y}}}}P({{\theta }_{-Y}}\mid a_{-Y}^{k}){{U}_{Y}}({{s}_{X}},{{s}_{Y}},E). $ (19)

2) Bayes rule. It is the same as described in Definition 1. Obviously,it is a PBNE of two-player general dynamic game with incomplete information. From [10],we can know there must be a PBNE solution. So,it is easy to draw the conclusion that there is at least one PBNE solution in DMTAGII.

Certainly,the integrated objective functions of team $X$ and $Y$ can also be expressed as

$ {{H}_{X}}({{s}_{X}},{{s}_{Y}},E)=P({{\theta }_{-X}}\mid {{a}_{-X}}){{U}_{X}}({{s}_{X}},{{s}_{Y}},E),$ (20)
$ {{H}_{Y}}({{s}_{X}},{{s}_{Y}},E)=P({{\theta }_{-Y}}\mid {{a}_{-Y}}){{U}_{Y}}({{s}_{X}},{{s}_{Y}},E). $ (21)

Actually,the conclusion of Theorem 1 is not very satisfying for us,because it does not provide a simple method to solve the equilibrium. However,we can introduce the concept of the best response to simplify the process of problem solving.

Definition 3. There is a fact in DMTAGII that for any strategy ${\pmb s}_Y\in S_Y$ put forward by team $Y$,team $X$ can always choose the corresponding strategy ${\pmb s}_X\in S_X$ to antagonize team $Y$. Thus,there exists a mapping (usually a multi-valued-mapping) $\varphi : S_Y \to S_X$,such that for all ${\pmb s}_X$,we have

$ H_X\bigl(\varphi({\pmb s}_Y),{\pmb s}_Y,E\bigr)\ge H_X\bigl({\pmb s}_X,{\pmb s}_Y,E\bigr). $ (22)

At this point,we call the set

$ Z_X=\bigl\{({\pmb s}_X,{\pmb s}_Y)\mid {\pmb s}_X=\varphi({\pmb s}_Y),~~({\pmb s}_X,{\pmb s}_Y)\in S_X\times S_Y \bigr\} $ (23)

as the best response of team $X$ for a specific strategy chosen by team $Y$. Similarly,there exists a mapping $\phi : S_X \to S_Y$ such that the best response of team $Y$ for a specific strategy chosen by team $X$ is

$ Z_Y=\bigl\{({\pmb s}_X,{\pmb s}_Y)\mid {\pmb s}_Y=\phi({\pmb s}_X),~~({\pmb s}_X,{\pmb s}_Y)\in S_X\times S_Y \bigr\}. $ (24)

The best response means when a team$'$s action is known or can be predicted,then the other teams will take the strategy which can optimize their profits according to the known or predictable action,which is the best response to the former team. In fact, the concept of the best response also provides a convenience for finding the PBNE of DMTAGII,which is the following theorem.

Theorem 2. Suppose $Z$ is the set of PBNEs of DMTAGII,then there must be $Z=Z_X\cap Z_Y$.

Proof. Suppose $({\pmb s}_X^\ast,{\pmb s}_Y^\ast)\in Z$, then the following inequalities

$ {{H}_{X}}(s_{X}^{*},s_{Y}^{*},E)\ge {{H}_{X}}({{s}_{X}},s_{Y}^{*},E),$ (25)
$ {{H}_{Y}}(s_{X}^{*},s_{Y}^{*},E)\ge {{H}_{Y}}(s_{X}^{*},{{s}_{Y}},E) $ (26)

are valid for each ${\pmb s}_X\in S_X$ and ${\pmb s}_Y\in S_Y$.

According to Definition 3,$({\pmb s}_X^\ast,{\pmb s}_Y^\ast)\in Z_X$ and $({\pmb s}_X^\ast,{\pmb s}_Y^\ast)\in Z_Y$,i.e., $({\pmb s}_X^\ast,{\pmb s}_Y^\ast)\in Z_X\cap Z_Y$,thus $Z\subseteq Z_X\cap Z_Y$.

Here,every step of the above inference is equivalent,so the result $Z_X\cap Z_Y\subseteq Z$ is also valid. Finally,$Z=Z_X\cap Z_Y$.

From Theorem 2,it is very convenient to solve PBNE directly, because we just need to solve the intersection of the above two best response sets.

Thus,the game tree formed by DMTAGII can be shown in Fig. 1.

Download:
Fig. 1. The game tree of DMATGII.

At the beginning stage of the game,the Nature will choose the types for all members of a certain team. For example,in the left half of Fig. 1,the members of team $X$ will choose strategies first. Then,the members of team $Y$ will expect the types of team $X$ based on the observed action combination and choose the optimal strategies to antagonize. After that,the members of team $X$ will also perform the above process to choose the optimal strategies. So it is done repeatedly until the end of the game.

D. Solution Algorithm

From the above dynamic process of DMTAGII,the payoffs of team members are uncertain at each stage and building the payoff matrix for all the members is unrealistic obviously,so we cannot use a general method for converting the game into the matter of linear programming to solve. Considering the characteristics of PBNE of DMTAGII we will use the interactive iteration algorithm based on the idea of the best response to solve the optimal solution of the whole problem. The algorithm flow is as follows (assuming the game is started from the $k_0$-th stage).

Step 1. The Nature chooses the types for the members of a certain team first (assumed to be team $X$). Given the terminal node $K$ and a numerical precision $\epsilon \ge 0$.

Step 2. The members of team $X$ choose the initial strategy ${\pmb s}_X^{k,t}$ arbitrarily corresponding to the specified type,let $k=k_0$,$t=1$.

Step 3. Firstly,for the given strategy ${\pmb s}_X^{k,t}$, use an optimization algorithm to solve the optimization problem (19) and obtain the optimal solution ${\pmb s}_Y^{k,t}$. Then,for this given strategy ${\pmb s}_Y^{k,t}$,use the optimization algorithm to solve the optimization problem (18) and obtain the optimal solution ${\pmb s}_X^{k+1,t}$,let ${\pmb s}_X^{k,t}={\pmb s}_X^{k+1,t}$.

Step 4. If $k < K$,let $k=k+1$,go to Step 3; otherwise,a set of strategy pairs $[({\pmb s}_X^{k_0,1},{\pmb s}_Y^{k_0,1}), \cdots,({\pmb s}_X^{K,1},{\pmb s}_Y^{K,1})]$ will be obtained. Go to Step 5.

Step 5. For the given ${\pmb s}_{Y}^{k,t}$,fix $k$ and solve problem (18) to obtain the optimal solution of team $X$ corresponding to the specified types. Then repeat Steps 3 and 4 to obtain another set of strategy pairs $[({\pmb s}_X^{k_0,2},{\pmb s}_Y^{k_0,2}),\cdots,({\pmb s}_X^{K,2},{\pmb s}_Y^{K,2})]$.

Step 6. If the infinite matrix norm satisfies

$ {\left\Vert({\pmb s}_X^{k,t},{\pmb s}_Y^{k,t})-({\pmb s}_X^{k,t+1},{\pmb s}_Y^{k,t+1})\right\Vert}_\infty\le{\epsilon},$ (27)

for any $k=k_0,2,\cdots,K$,then $({\pmb s}_X^{k_0,t+1},{\pmb s}_Y^{k_0,t+1})$ is the PBNE of DMTAGII on the $k_0$-th information set; otherwise,let $t=t+1$ and go to Step 5.

Actually,the above algorithm flow can be described intuitively by Fig. 2.

Download:
Fig. 2. The solution algorithm flow of DMTAGII.
Ⅲ. THE TACTICAL DECISION MAKING MODEL BASED ON DMTAGII FOR MULTI-UAV AGAINST MULTI-TARGET

The tactical decision making for multi-UAV against multi-target is a typical MTAG process. As described in the introduction,the realistic problems in battlefield environment are very complicated. Our current focus is to explore the feasibility of using DMTAGII to solve the problems. Therefore,a simple antagonistic scenario will be considered in this paper.

A. Analyzing and Modeling

We take an air formation (Blue team) of multiple heterogenous UAVs against a small ground military base (Red team) as the actual background. The scenarios described as follows.

1) Blue team is formed by three unmanned combat aerial vehicles (UCAVs) and one unmanned reconnaissance aerial vehicle (URAV). Each UCAV loads some missiles to attack the Red military targets or intercept the incoming missiles,especially the missiles attacking URAV. While URAV is responsible for battlefield surveillance,target tracking and reporting the real-time battlefield information to UCAVs. The goal of Blue team is to ensure the safety of URAV and maximize the damage of Red team.

2) Red team is also formed by three ground missile-launching positions (MLPs) and one operational command vehicle (OCV). Similarly,each MLP has some missiles which can attack the UCAV or URAV of Blue team as well as intercept the coming missiles. The responsibility of OCV is to provide the battlefield information and send out the operational command. The goal of Red team is to ensure the safety of OCV and maximize the damage of Blue team.

In this paper,we assume that the communication between members in the same team is perfect (if someone is killed,other members can still keep the information communication) and the fusion of multi-source information is effective. The members,types and the executable actions of two teams are shown in Table I.

Table Ⅰ
FEATURES OF Blue AND Red TEAMS

Assume that each member can only choose one type and one action against one target in each game stage. When a UCAV attacks a MLP, the MLP can either intercept the missile or attack the platform of the UCAV. Corresponding to the DMTAGII model,the mathematical description of some key concepts for multi-UAV against multi-target is as follows.

1) Type set. It is necessary to define the type set for each team. Suppose the type set of Blue team $X$ is ${\pmb \theta}_X=[\theta_{x,1},\cdots,\theta_{x,b},\cdots, \theta_{x,7}]^{\rm T}$,where $\theta_{x,b}$ is the combination of the team member and type. For example,$\theta_{x,1}=(x_1, Missile1)$,$\theta_{x,2}=(x_1,Platform1)$,$\cdots$, $\theta_{x,7}=(x_4,Platform2)$. Similarly,the type set of Red team $Y$ is ${\pmb \theta}_Y=[\theta_{y,1},\cdots,\theta_{y,r}, \cdots,\theta_{y,7}]^{\rm T}$.

2) Strategy set. Firstly,let the vector of type-action of Blue team be

$ {\pmb {TA}}_X=\left[\begin{array}{c} {(Missile1,Attack)}\\ {(Missile1,Keep)}\\ {(Platform1,Defense)}\\ {(Platform1,Keep)}\\ {(Platform2,Defense)}\\ {(Platform2,Keep)} \end{array}\right]. \nonumber $

Then,at the $k$-th stage,the specific strategy of member $x_i$ can be defined as a matrix,i.e.,

$ F_{xi}^k=(f_{lr}^i)_{6\times 7},$ (28)

where

$ f_{lr}^i=\left\{\begin{array}{ll}{1,} &x_i\;\; {\rm uses}\;\; {\pmb {TA}}_X(l)\;\; {\rm to}\;\; \theta_{y,r},\\ \\ 0,& {\rm otherwise}, \end{array}\right. $ (29)

and $\sum_{l=1}^6{\sum_{r=1}^7}f_{lr}^i=1$.

It is similar to define ${\pmb {TA}}_Y$ and $F_{yj}^k=(f_{lb}^j)_{6\times 7}$ for Red team.

3) Status. The game status is mainly reflected on the inventories of missiles and the damage degrees of platforms. We also take Blue team as an example,to which Red team is similar. Suppose at the $k$-th stage,the inventory of missiles of member $x_i$ is $M_{xi}^k$ and the probability of being hit by a single missile in fire condition is $Pm_{xi}$. Meanwhile,suppose the damage degree of each member$'$s platform is $Dp_{xi}^k$. In case of no defense, the probability of platform being hit by a single missile is $Pp_{xi}$,then the damage degree after being hit is $Dd_{xi}$. In case of defense,the damage effect of the platform will be reduced to $\tau_{xi}$ times of that in the case of no defense.

Next,we consider the case that the types of team member $x_i$ are attacked by multiple missiles in one game stage. Suppose the number of these missiles is $N_{\theta xi,y}^k$,then the probabilities of the types of member $x_i$ being hit by these missiles are

$ Pm_{xi}(N_{\theta xi,y}^k)=1-(1-Pm_{xi})^{N_{\theta xi,y}^k},$ (30)
$ Pp_{xi}(N_{\theta xi,y}^k)=1-(1-Pp_{xi})^{N_{\theta xi,y}^k}. $ (31)

Similarly,there are also changes in $Dd_{xi}$ and $\tau_{xi}$, i.e.,

$ Dd_{xi}(N_{\theta xi,y}^k)=1-(1-Dd_{xi})^{N_{\theta xi,y}^k},$ (32)
$ \tau_{xi}(N_{\theta xi,y}^k)=1-(1-\tau_{xi})^{N_{\theta xi,y}^k}. $ (33)

Thus,at the $(k+1)$-th stage,the status transition of member $x_i$ is that

$ M_{xi}^{k+1}=M_{xi}^k-N_{xi}^k,$ (34)
$ \begin{array} Dp_{xi}^{k+1}=Dp_{xi}^k+dp_{xi}^k,\end{array} $ (35)

where $N_{xi}^k=\sum_{r=1}^7F_{xi}^k(1,r)$ is the number of missiles which member $x_i$ chooses to launch. In this paper,the maximum of $N_{xi}^k$ is assumed to be $1$. Besides,

$ \begin{array} dp_{xi}^k&=(1-Dp_{xi}^k)\cdot Pp_{xi}(N_{\theta xi,y}^k)\cdot Dd_{xi}(N_{\theta xi,y}^k)\cdot\nonumber\\ &\quad \biggl[(\tau_{xi}(N_{\theta xi,y}^k)-1)\cdot {\sum_{r=1}^7F_{xi}^k(Platform,r)}+1\biggr] \cdot\nonumber\\ &\quad \prod_{j=1}^4\biggl[(1-Pm_{yj}(N_{\theta yj,x}^k))\cdot F_{yj}^k(1,b)\biggr], \end{array} $ (36)

where $\sum_{r=1}^7F_{xi}^k(Platform,r)$ reflects the action of member $x_i$$'$s platform (1 represents defense and 0 represents keep),$\prod_{j=1}^4[(1-Pm_{yj}(N_{\theta yj,x}^k))\cdot F_{yj}^k(1,b)]$ reflects the situation of no intercepted missiles from team $Y$ to member $x_i$ and the subscript $b$ is the $b$-th type in matrix $F_{yj}^k$ corresponding to $x_i$$'$s platform.

4) Payoff. Assume that the self payoff values of missile and platform of team member $x_i$ are $Vm_{xi}$ and $Vp_{xi}$,then at the $k$-th stage,the self payoff value of team member $x_i$ is

$ V_{xi}^k=M_{xi}^k\cdot Vm_{xi}+(1-Dp_{xi}^k)\cdot Vp_{xi}. $ (37)

Whatever strategies $x_i$ or $y_j$ adopts,the income of $x_i$ can be expressed as

$ R_{xi}^k=\sum_{r=1}^7\biggl[\frac{dp_{yj}^kVp_{yj}^{k-1}}{N_{\theta yj,x}^k}\biggr]\cdot [1-Pm_{xi}(N_{\theta xi,y}^k)]F_{xi}^k(1,r)+\nonumber\\ \quad \sum_{j=1}^4[\sum_{b}Vm_{yj}\cdot F_{yj}^k(1,b)]. $ (38)

When $N_{\theta yj,x}^k=0$,let

$ \frac{dp_{yj}^kVp_{yj}^{k-1}}{N_{\theta yj,x}^k}=0. $ (39)

The term on the left of ``$+$'' in (38) is the income of $x_i$ from the platforms on the opponent side,where $y_j$ corresponds to the subscript $r$ in matrix $F_{xi}^k$; while the term on the right of ``$+$'' is the income of $x_i$ from the missiles on the opponent side,where the subscript $b$ in matrix $F_{yj}^k$ corresponds to $x_i$.

Meanwhile,the cost of $x_i$ is

$ Q_{xi}^k=N_{xi}^k\cdot Vm_{xi}+dp_{xi}^k\cdot Vp_{xi}. $ (40)

Thus,the total payoff of $x_i$ at the $k$-th stage can be expressed as

$ u_{xi}^k=V_{xi}^{k-1}+R_{xi}^k-Q_{xi}^k. $ (41)

For the payoff of the whole Blue team,the focus of the team is the operational effect,so it can be defined as

$ u_{X}^k=\sum_{i=1}^4V_{xi}^k-\sum_{j=1}^4V_{yj}^k. $ (42)

Similarly,there are also such definitions in Red team which are not repeated here.

5) Belief. The expression of belief in the problem of multi-UAV against multi-target is very similar to the preceding DMTAGII model,so we do not repeat,either. As for the specific numerical distribution,we will give in the next section.

6) Constraint. In the realistic background,there are many constraint conditions,but here we just discuss some simple ones. Firstly,there are some explicit constraints for the team member $x_i$:

$ \left\{\begin{array}{l}\sum\limits_{l=1}^6\sum\limits_{r=1}^7f_{lr}^i=1,\\ \\ M_{xi}^k\ge 0.\end{array}\right. $ (43)

As the coexistent relationship between missile and platform in member $x_i$,it can be considered that the platform has lost combat ability if the damage degree of $x_i$$'$s platform is greater than or equal to a threshold $\sigma_{xi}$ at the $k$-th stage. In this case,whether or not $x_i'$ platform has remaining missiles,it must choose to exit the game. Meanwhile,the strategy at the next stage will be changed to $F_{xi}^{k+1}=\mathbf{0}$.

Besides,it is necessary to add a mandatory constraint in Blue team,$Dp_{x4}^k\le \sigma_{x4}$,because of the importance of URAV; at every stage of the game,Blue team members must ensure the safety of URAV. Similarly,there is also a mandatory constraint in Red team,$Dp_{y4}^k\le \sigma_{y4}$. The mandatory constraints will appear in the objective functions in the form of penalty function.

7) Weight. Generally,the weight of each team member will be changed with the antagonistic situation. For simplicity,we assume the weight of each member is reflected by its original self payoff value,

$ \xi_{xi}=\frac{V_{xi}^0}{\sum\limits_{i=1}^4V_{xi}^0}. $ (44)

8) Objective function. To sum up,in the game model the control variables are $F_{xi}^k$ and $F_{yj}^k$. Then the solution of game model is reduced to make the following optimization models be established simultaneously from the $k$-th stage to the terminal node $K$ of the game.

Problem 1.
$ \arg\; \max_{F_{xi}}P({\pmb \theta}_{-X}\mid {\pmb a}_{-X}^k)\biggl[\sum_{i=1}^4\xi_{xi}u_{xi}^k+\frac{1}{4}u_X^k\biggr],$ (45)
$ \mbox{s.t.} \left\{\begin{array}{l}\sum\limits_{l=1}^6\sum\limits_{r=1}^7f_{lr}^i=1,\\ M_{xi}^k\ge 0,\\ Dp_{x4}^k\le \sigma_{x4},\\ {\rm when}\; Dp_{xi}^k\ge \sigma_{xi},\; {\rm let}\; F_{xi}^{k+1}=\mathbf{0}. \end{array}\right.\nonumber $
Problem 2.
$ \arg\max_{F_{yj}}P({\pmb \theta}_{-Y}\mid {\pmb a}_{-Y}^k)\biggl[\sum_{j=1}^4\xi_{yj}u_{yj}^k+\frac{1}{4}u_Y^k\biggr],$ (46)
$ \mbox{s.t.} \left\{\begin{array}{l}\sum\limits_{l=1}^6\sum\limits_{b=1}^7f_{lb}^j=1,\\ M_{yj}^k\ge 0,\\ Dp_{y4}^k\le \sigma_{y4},\\ {\rm when}\; Dp_{yj}^k\ge \sigma_{yj},\; {\rm let}\; F_{yj}^{k+1}=\mathbf{0}. \end{array}\right.\nonumber $

9) Algorithm. It is obvious that for each stage,the above optimization problems belong to the nonlinear integer programming problems which we can use the genetic algorithm to solve. In the application of genetic algorithm,the most critical part is the chromosome coding. As matrix $F_{xi}^k$ has only one element equal to $1$ and others all equal to $0$,we can set up a one-to-one mapping $\pi$ which can map the subscripts $(l,r)$ of the unique nonzero element to a natural number. The formula is $\pi(l,r)=6\times l+r$. This method contains the constraint $\sum_{r=1}^7f_{lr}^i=1$ and reduces the dimension of search space of strategies. Obviously,in $F_{xi}^k$,members $x_1$,$x_2$ and $x_3$ correspond to the first four rows while member $x_4$ corresponds to the last two rows. Then the strategies of $x_1$, $x_2$ and $x_3$ can be mapped to the natural numbers from $1$ to $28$ while $x_4$ can be mapped to the natural numbers from $1$ to $14$. Thus,the application of genetic algorithm becomes very easy,which can be combined with the interactive iteration algorithm.

Ⅳ. SIMULATION AND ANALYSIS A. Setting of Parameters

We set up the initial military force distribution of Blue team and Red team as in Table Ⅱ.

Table Ⅱ
INITIAL MILITARY FORCE DISTRIBUTION

For Blue team,we set the prior probability distribution of member $x_i$ ($i=1,2,3$) choosing missile or platform is $\{0.6,0.4\}$. As member $x_4$ only has one type (platform),its prior probability distribution is $\{1\}$. Similarly,for Red team,we set the prior probability distributions as $y_1 \sim y_3-\{0.7,0.3\}$ and $y_4-\{1\}$.

Meanwhile,the conditional probabilities of team members choosing actions in the given types are provided in Table Ⅲ.

Table Ⅲ
CONDITIONAL PROBABILITIES OF CHOOSING ACTIONS

Besides,we assume that the Nature chooses Blue team as the first actor and has specified the types of the members of Blue team as $x_1-$Missile1,$x_2-$Platform1,$x_3-$Missile1 and $x_4-$Platform2. According to the proposed algorithm,we use Matlab to solve the constrained optimization Problems 1 and 2.

B. Results and Analysis

We get a set of optimal strategy pairs by the simulation from $k_0=1$ to $K=13$,as shown in Table IV,where ``MA'' stands for missile attack,``MK'' stands for missile keep,``PD'' stands for platform defense,``PK'' stands for platform keep,``M'' stands for missile and ``P'' stands for platform.

Note that the above optimal interactive iteration algorithm is only based on the $k_0$-th information set,where the objective function value of each step reflects the expected payoff of team member. Really deciding the specific behaviors of Blue team and Red team is the first strategy pair $(F_x^{1,t+1},F_y^{1,t+1})$. When the game goes on to the ($k_0$+1)-th information set in reality,they need to recalculate the payoff based on the current status to choose an equilibrium strategy pair. So to verify the feasibility and effectiveness of the model proposed in this paper,we set up some specific status for each stage of the game as shown in Table IV, which gives the judgement of whether or not the team member performs strategy successfully.

With the optimal strategies,the payoff of every member is shown in Figs.3 and 4.

Download:
Fig. 3. The payoff of Blue members.

Download:
Fig. 4. The payoff of Red members.

From Table IV and Figs.3 and 4,it is clear that Blue team maintained the superiority; multiple UCAVs attacked the OCV and MLP continuously while URAV took defense timely. On the contrary,Red team was always in a passive situation whose operational strategies were mainly defending or intercepting the coming missiles. The main reason is that Blue team held the initiative in combat as the first actor while Red team could only defend passively. The interesting thing is,before the $5$th stage the 3 UCAVs deployed missiles to attack OCV or MLP each time,but from the $6$th stage,they began to take turns to choose defense strategy,other UCAVs only attacked the OCV of Red team to contain the MLPs,thus Blue team preserved their power. When the game went on to the $11$th stage,Red team had no missile but Blue team still had 6 missiles. At that time,Blue team began to attack the OCV with full force such that the damage degree of OCV rose sharply,which can be seen in Fig. 4.

Table Ⅳ
THE OPTIMAL STRATEGY PAIRS FOR Blue AND Red TEAMS

In addition,in the optimal strategies,there are some strong cooperations between the team members. For example,in the strategy series of Blue team,3 UAVs first focused fire to attack and caused the Red members (especially the aggressive members) struggling to cope with,and then they began to take turns to rest,and preserved the power to prepare for the final assault. Obviously,this kind of strategy could not be performed without cooperation. Similarly,the cooperation in Red team was also obvious. When Blue team attacked the OCV,MLPs did not attack the UCAVs or URAV but chose to intercept the missiles as OCV was very important in Red team. Of course,they would also balance the individual objectives and team objective. For example,at the $9$th stage,they turned to attack the platforms of Blue team members. But it was too late because they only could attack twice due to the limited number of their missiles. Besides,the cooperation can also be expressed according to the Figs.3 and 4,where the changes of the payoffs of team members with the same characteristics are very consistent,especially Red team members.

Furthermore,the payoff of the whole team is shown in Fig. 5. Actually,the payoff of team is the differences of total value of all members between the two teams. From Fig. 5,we can see that the payoff of Blue team increased as the game progresses,while that of Red team decreased. Corresponding to the strategies of Red team members,we can find that when the payoff of Blue team was greater than that of Red team (after the $8$th stage),the Red team members began to counterattack rather than to keep defense all the time. It is very coincident with the actual scenes.

Download:
Fig. 5. The payoff of Blue and Red teams.

Finally,the statistic data of antagonistic results of Blue team and Red team is shown in Table V.

From Table V,it can be seen that the dominance of Blue team is very obvious. The greatest cost of platform is only $100$ ($10%$) in Blue team,while the one in Red team is 3477 (nearly $70%$). Besides,the equilibrium strategy means that no matter what strategy Red team adopts,Blue team can always generate the corresponding strategy to obtain the net earnings no less than 4177. Of course, the main reason is that Blue team is the first actor. If Red team is the first one,then this equilibrium will be changed.

Table Ⅴ
STATISTICS OF ANTAGONISTIC RESULTS

To sum up,we find that the DMTAGII model proposed in this paper is very effective to solve the problems of tactical decision making for multi-UAV against multi-target. Meanwhile,the optimal equilibrium strategy pair of Blue team and Red team is also consistent with the realistic battlefield scenarios. Besides,if the multi-UAV learn the equilibrium,they can also take some deceptive actions to entice the opponent team generate inferior equilibrium strategy,thus improve their own income.

In reality,the antagonism between multiple teams is a very common scenario,of which a basic characteristic is that the members in a team cooperate with each other to antagonize other teams jointly. So it is also a game process. At present,the studies on MTAGs are still in the early stage,because this complicated problem involves not only incompleteness of information and conflict of interests (including the internal interest and external interest of the team),but also selections of antagonistic targets (that is a multi-objective assignment problem). Therefore,based on the previous researches,a new framework is proposed in this paper, which is the DMTAGII model. For this model,the corresponding concept of PBNE is established and the existence of PBNE is also proved. Besides,an interactive iteration algorithm is introduced according to the idea of the best response for solving this equilibrium.

At the same time,the applications of MTAGs are very extensive, especially in the multi-agent systems. Therefore,a simulation of multi-UAV against multi-target is studied to verify the feasibility and effectiveness of using the DMTAGII model to solve the problems of tactical decision making. In the process of application,the specific expressions of strategy,status and payoff of the game are considered,and the strategy is coded to match the structure of genetic algorithm so that PBNE can be solved by combining the genetic algorithm and the interactive iteration algorithm. Finally,through the simulation,it can be seen that the DMTAGII model is very suitable for solving the tactical decision making problems for multi-UAV against multi-target. Meanwhile,the calculated equilibrium strategies are also feasible and realistic, which can provide certain reference for improving the autonomous ability of UAV systems. Surely,the current work is just a beginning. Considering the complexity of dynamic multi-team game, we still have a lot of work to do in the future such as multi-team cooperative game,multi-team differential game and so on.
References
[1] Rasmussen S J, Shima T. UAV Cooperative Decision and Control:Challenges and Practical Approaches. Society for Industrial and Applied Mathematics, 2009.15-19
[2] Bardhan R, Ghose D. Resource allocation and coalition formation for UAVs:a cooperative game approach. In:Proceedings of the 22nd International Conference on Control Applications. Hyderabad, India:IEEE, 2013. 1200-1205
[3] Semsar-Kazerooni E, Khorasani K. Multi-agent team cooperation:a game theory approach. Automatica, 2009, 45(10):2205-2213
[4] Sandholm W H. Population Games and Evolutionary Dynamics. Massachusetts:MIT Press, 2011. 1-15
[5] von Stengel B, Koller D. Team-maxmin equilibria. Games and Economic Behavior, 1997, 21(1-2):309-321
[6] Liu Y, Simaan M A. Noninferior Nash strategies for multi-team systems. Journal of Optimization Theory and Applications, 2004, 12(1):29-51
[7] Ahmed E, Hegazi A S, Elettreby M F, Asker S S. On multi-team games. Physica A, 2006, 369(2):809-816
[8] Elettreby M F, Hassan S Z. Dynamical multi-team Cournot game. Chaos, Solitons and Fractals, 2006, 27(3):666-672
[9] Asker S S. On dynamical multi-team Cournot game in exploitation of a renewable resource. Chaos, Solitons and Fractals, 2007, 32(1):264-268
[10] Harsanyi J C. Game with incomplete information played by bayesian players part III:the basic probability distribution of the game. Management Science, 1968, 14(7):486-502
[11] Chen J, Zha W Z, Peng Z H, Zhang J. Cooperative area reconnaissance for multi-UAV in dynamic environment. In:Proceedings of the 9th Asian Control Conference. Istanbul, Turkey:IEEE, 2013. 1-6
[12] Zhao Ming, Su Xiao-Hong, Ma Pei-Jun, Zhao Ling-Ling. A unified modeling method of UAVs cooperative target assignment by complex multi-constraint conditions. Acta Automatica Sinica, 2012, 38(12):2038-2048(in Chinese)
[13] Hui Yi-Nan, Zhu Hua-Yong, Shen Lin-Cheng. Study on dynamic game method with incomplete information in UAV attack-defends campaign. Ordnance Industry Automation, 2009, 28(1):4-7(in Chinese)
[14] Chen Xia, Liu Min, Hu Yong-Xin. Study on UAV offensive defensive game strategy based on uncertain information. Acta Armamentarii, 2012, 33(12):1510-1514(in Chinese)
[15] Emre K, Gokhan I. Exploiting delayed and imperfect information for generating approximate UAV target interception strategy. Journal of Intelligent and Robotic Systems, 2013, 69(1-4):313-329
[16] Bhattacharya S, Basar T. Differential game-theoretic approach to a spatial jamming problem. Advances in Dynamic Games, 2013, 12:245-268
[17] Herbert Gintis. Game Theory Evolving:A Problem-Centered Introduction to Modeling Strategic Interaction. New Jersey:Princeton University Press, 2008.41-45
[18] Mei S W, Zhu J Q. Mathematical and control scientific issues of smart grid and its prospects. Acta Automatica Sinica, 2013, 39(2):119-131