IEEE/CAA Journal of Automatica Sinica  2015, Vol.2 Number (4): 431-439   PDF    
Security Risk Assessment of Cyber Physical Power System Based on Rough Set and Gene Expression Programming
Song Deng1, Dong Yue1 , Xiong Fu2, Aihua Zhou3    
1. Research Institute of Advanced Technology, Nanjing University of Post and Telecommunications, Nanjing 210003, China;
2. Computer Science and Technology, Nanjing University of Post and Telecommunications, Nanjing 210003, China;
3. State Grid Smart Grid Research Institute, Beijing 102209, China
Abstract: Risk assessment is essential for the safe and reliable operation of cyber physical power system. Traditional security risk assessment methods do not take integration of cyber system and physical system of power grid into account. In order to solve this problem, security risk assessment algorithm of cyber physical power system based on rough set and gene expression programming is proposed. Firstly, fast attribution reduction based on binary search algorithm is presented. Secondly, security risk assessment function for cyber physical power system is mined based on gene expression programming. Lastly, security risk levels of cyber physical power system are predicted and analyzed by the above function model. Experimental results show that security risk assessment function model based on the proposed algorithm has high efficiency of function mining, accuracy of security risk level prediction and strong practicality.
Key words: Gene expression programming     function mining     security risk assessment     cyber physical power system    

I. INTRODUCTION

With the increasingly widespread application of information and communication technology, smart grid has gradually evolved into cyber physical system of deep integration between information space and physical space[1, 2, 3].

Because information system affects operation and control of physical system, all kinds of existing information systems with security risks will naturally be introduced into the physical systems. These security risks and vulnerabilities greatly increase the probability of intrusions and attacks on physical power systems from Internet[4, 5, 6, 7, 8]. Reference [4] presents a security-oriented stochastic risk management technique, which can calculate cyber-physical security indices to measure the security level of the underlying cyber-physical setting. Reference [5$-$8] analyzed all kinds of attacks on power physical systems, designed countermeasures for simultaneous intrusions, model-based attack and other network attacks (e.g., Dos, DDos, replay attack etc). Traditional power outage is caused when safety and reliability of physical systems are destroyed. However, with the increase of complexity in smart grid, all kinds of techniques which attack physical systems through security vulnerabilities of information systems have also increased. Due to dependence of physical power system on information system and control system, vulnerability of control network and complexity of power grid, local and small disturbances of control system by network intrusion attacks can cause collapse of physical systems. Finally, a large area of blackout will be produced. The significant potential security hazard of cyber physical power system (CPPS) is that security of information space will inevitably affect security of physical space so as to form a chain reaction, and eventually lead to the collapse of the physical space[9]. Iran's "Stuxnet" virus crossed the border of information and physical space and led to serious consequences. So, the rapid development of information and communication technology not only improves control capability of power grid, but also gives physical systems to lay security hazards. Timely detection and assessment of various security risks for CPPS is essential for the effective control and solution of security risks in CPPS.

Risk assessment[10] is probability estimation of risks that threats and vulnerabilities cause. In recent years, risk assessment has been widely applied to many aspects of the power system[11, 12, 13, 14]. But security risk assessment for cyber physical power system is also unusual.

However, complexity and dynamic change of CPPS cause the uncertainty of risk assessment. The unsuitable risk assessment methods will greatly affect accuracy of risk assessment. Traditional assessment methods include qualitative assessment, quantitative assessment and the integration of qualitative and quantitative assessment[10]. Due to the complexity, nonlinearity and uncertainty of security risk assessment, these assessment methods have some limitations, subjective arbitrariness and fuzziness. The operation of these methods is more, complex, and, lack of ability, of self-learning. So, many methods based on artificial intelligence and machine learning are introduced into security risk assessment. Reference [15] proposed intelligent information security risk assessment based on a decision tree algorithm. Reference [16] presents risk assessment of information security based on improved wavelet neural network. Information security risk assessment based on grey-analytic network process (G-ANP) was proposed in [17]. Reference [18] applied fuzzy expert system to information security risk assessment for an attendance system. Because of more security risk elements in CPPS, these methods are easy to lead to high computational load, low accuracy and complex operation of security risk assessment. The essential work in the process of security risk assessment is the construction of the risk level classification model. However, these methods can only be used to divide the risk level, which leads to the difficulty in solving the classification function model linkingrisk level and security risk factors.

In order to better solve these problems, combining with attribution reduction and function mining, security risk assessment of cyber physical power system based on rough set and gene expression programming (SRACPPS-RGEP) is proposed in this paper.

This paper makes the following contributions: 1) it presents fast attribution reduction based on binary search algorithm; 2) it proposes SRACPPS-RGEP and 3) it describes simulated experiments that have been done and provides performance analysis results.

The content of this paper is organized as follows. Section II provides fast attribution reduction based on binary search algorithm. Section III introduces SRACPPS-RGEP. Section IV represents experiments and analysis. Finally, conclusions are given in Section V.

II. FAST ATTRIBUTION REDUCTION BASED ON BINARY SEARCH ALGORITHM

A. Security Risk Element of CPPS

For security risk assessment of CPPS, the first thing to do is to analyze security risks of information and physical systems. For the physical systems, Reference [1, 19] noted that security risks include matching degree of local power supply, natural disasters, load changes of zone, transmission line fault, supply reliability of zone, small signal stability, transient stability, voltage stability, protection equipment failures, power communication equipment failures etc. For the information systems, Reference [19] noted that security risks of cyber systems include hardware and software failures, physical environment impact, malicious code, cyber-attack, physical attack, leakage, misrepresentation, repudiation, operational error, poor management, unauthorized use and abuse etc.

In order to better describe security risk element for CPPS, security risk elements' set is defined as follows.

$\textbf{Definition 1. Security risk elements' set.}$ Let the sets be defined as $P=\{\langle p_1 , v_1 \rangle, \ldots, \langle p_k , v_k \rangle , \ldots, \langle p_n , v_n \rangle\}$, $k\in [1, $ $n]$, $I=\{\langle i_1 , v_1 \rangle, \ldots, \langle i_j , v_j \rangle, \ldots, \langle i_m , v_m \rangle\}$, $j\in [1, m]$, $S$ $=$ $\{I, P, D\}$, where $I$ represents collection of security risk $i$ of information systems in CPPS and the corresponding risk values $v$, $P$ represents collection of security risk $p$ of physical systems in CPPS and $v$, and $D$ represents risk level of CPPS. $S$ is called security risk elements' set (SRES) of CPPS.

$\textbf{Definition 2. Security risk decision table.}$ Let decision table be $T=\langle U, C\cup D, V, f\rangle$, where $U$, $C\cup D=R$, $C$ $=\langle I, P\rangle$, $C=\{c_1 , c_2 , \ldots, c_n \}$, $D=\{d_1 , d_2 , \ldots, d_m \}$, and $V=\cup v_r $, $r\in{\bf R}$ respectively represent data collection, all cyber and physical security risk collection, risk level collection, risk value and risk level value collection of SRES. $f:$ $U\times{\bf R}\to V$ represents information function. $\forall, r\in {\bf R}$, $x$ $\in$ $U$, $f(x, r)\in v_r $. $T$ is called security risk decision table.

An example is shown as Table I.

Table I
EXAMPLE OF SECURITY RISK DECISION TABLE

B. FAR-BSA

For complex and high-dimensional data sets, direct analysis is very difficult. So, the dimension reduction is necessary. Existing methods on dimension reduction include principal component analysis (PCA), singular value decomposition (SVD) and rough set (RS) etc., where PCA and SVD inevitably result in the loss of part of the decision information. Dimension reduction based on RS does not change the decision rules of original data[20, 21]. Due to unique and equivalence of the optimal reduction, we just need to solve any optimal reduction. So, fast attribution reduction based on binary search algorithm (FAR-BSA) is proposed in this paper.

Firstly, related definitions of FAR-BSA are given as follows[22].

$\textbf{Definition 3.}$ Let security risk decision table be $T= \langle U$, $C\cup D, V, f\rangle$, where $\forall, P\subseteq {\bf R}$, and $x, y\in U$. $\forall, r\in P$, if and only if $f(x, r)=f(y, r)$, then $x$ and $y$ are indiscernible, and denoted by $U/R$ or $IND(P)=\{(x, y)\in U|\forall, r\in P$, $f(x, r)$ $=$ $f(y, r)\}$.

$\textbf{Definition 4.}$ Let security risk decision table be $T=\langle U, , C\cup D, V, f\rangle$, the positive region $POS_C (D) =\cup \{Y_i \subset U/C: Y_i \subset D\}$.

$\textbf{Definition 5.}$ Let security risk decision table be $T= \langle U, $ $C\cup D, V, f\rangle$. Dependence degree between condition attribution $C$ and decision attribution $D$ is denoted by $r_C (D)$, if $r_C (D)={card(POS_C (D))}/{card(U)}=1$, then $T$ is consistent, where $card\mbox{(}U\mbox{)}$ represents the number of elements in collection $U$.

$\textbf{Definition 6.}$ Let security risk decision table $T=\langle U, $ $C\cup D, V, f\rangle$ be consistent. For $c\subset C$, if $POS_C (D)$ $=$ $POS_{C-\{c\}} (D)$, then condition attribution $c$ is reducible.

$\textbf{Property 1.}$ Let security risk decision table $T=\langle U, $ $C\cup D, V, f\rangle$ be consistent. For $c\subset C$, if $POS_C (D)$ $=$ $POS_{C-\{c\}} (D)$, then $T'=\langle U, C-\{c\}\cup D, V, f\rangle$, which reduces condition attribution $c$ to be consistent.

$\textbf{Proof.}$ From Definition 5, for security risk decision table $T$ $=$ $\langle U, C\cup D, V, f\rangle$,

\begin{equation} \label{eq1} r_C (D)=\frac{card(POS_C (D))}{card(U)}=1 \end{equation} (1)
as $POS_C (D)=POS_{C-\{c\}} (D)$, (2) holds:
\begin{equation} \label{eq2} r_{C{-\{c\}}} (D)=\frac{card(POS_{C{-\{c\}}} (D))}{card(U)}=\frac{card(POS_C (D))}{card(U)}. \end{equation} (2)
From (1) and (2), $r_{C{-\{c\}}} (D)$ equals to 1. It means that $T'=$ $\langle U, C-\{c\}\cup D, V, f\rangle$ is consistent.

$\square$

$\textbf{Definition 7.}$ Let condition attribution set $C'$ be a reduction of security risk decision table $T$. Reduction which contains the least condition attribution in $C'$ is called the optimal reduction of $T$.

$\textbf{Lemma 1.}$ Let security risk decision table $T=\langle U, C \cup D, $ $V, f\rangle $ be consistent. $S\subset C$, if $S$ is not reduction of $T$, then $\forall, S'$ $\subset$ $S$ is not reduction too.

$\textbf{Proof.}$ Supposing that $\forall, S'\subset S\subset C$ is a reduction, then from Property 1, $T$ is still consistent after condition attribution set $C$ removes $S'$. And because of $S'\subset S$, then $(C-S)\subset$ $(C-S')$. From Definition 6 and known conditions, we know that security risk decision table $T$ must be consistent after condition attribution set $C$ removes $S$. Then from Property 1, for $S\subset C$, $S$ is reducible too. This conclusion is inconsistent with the known conditions. So the original proposition is right.

$\square$

The basic steps of FAR-BSA are shown as follows.

$\textbf{Algorithm 1.}$ FAR-BSA ($T$)

$\textbf{Input.} T=\langle U, C\cup D, V, f\rangle, C=\{x_1 , x_2 , \ldots, x_n \}$;

$\textbf{Output.}$ bestReduction;

%\textbf{Algorithm}$ FAR-BSA ($T$)

${\bf Step 1.}$ int $m=n/2$;

${\bf Step 2.}$ Looping through all reduction for including $m$ conditional attribution in $C$. If finding a reduction according to coordination of security risk decision table $T$, then going to Step 3 until finding the optimal reduction, or going to Step 4;

${\bf Step 3.}$ If $m$ equals to 1, then the reduction is the bestReduction. If $m$ equals to $n$, then bestReduction equals to $C$. Otherwise, let $m$ equals to $m/2$, and go to Step 2;

${\bf Step 4.}$ If conditional attribution subset is not a reduction when $m$ equals to $n/2$, then let $m$ equals to $(m+n)/2$, and go to Step 2;

${\bf Step 5.}$ return bestReduction.

In this paper, we suppose that security risk decision table $T$ is coordinated. In FAR-BSA, maximum time complexity of computing conditional attribution combination is ${\rm O}(\log _2^m )$ $(1\le m\le n)$. Time complexity of computing a reduction by coordination of $T$ is ${\rm O}(| U| )$. So, the maximum time complexity of solving the optimal reduction is ${\rm O}(| U| \times \log _2^m )$.

III. SRACPPS-RGEP

Essentially, the risk assessment is to build a nonlinear function among system assets, threats and asset vulnerabilities[16]. In terms of function mining, risk assessment can be seen as a process which mines function model among factors affecting system security and risk level, and determines risk level of the current system based on the function model. Traditional function mining algorithms include regression methods, genetic programming (GP) and gene expression programming (GEP) etc. Regression methods assume that function type is known in advance, then estimate parameters by means of the least squares method[23] or the improved methods, and finally, obtain the function model. These methods depend on a priori knowledge and a lot of subjective factors so that the complex function model is not easy to be built. Moreover, these algorithms have high time complexity and low computation efficiency for complex high-dimensional data. To solve these problems, [24] used GP for mathematical modeling, obtained good experimental results, and avoided the defect of traditional statistical methods. However, the efficiency of function model mined by GP was low. So, a new algorithm which was called GEP was proposed in [25]. Compared with GP, efficiency of complex function mined by GEP was improved by 4-6 times.

Meanwhile, due to complexity of cyber physical power system, factors that affect security of CPPS are massive, complex and diverse. Risk assessment function model directly mined by GEP will lead to higher time complexity. In this paper, security risk assessment for CPPS based on rough set and gene expression programming (SRACPPS-RGEP) is proposed. Firstly, risk factors which affect security of CPPS are reduced by calling Algorithm 1. And then, function model among security risk factors in CPPS is mined by improved GEP.

A. Code of SRACPPS-RGEP

Gene code is an important expression form of SRACPPS-GEP and described as follows.

$\textbf{Definition 8.}$ Let function set be $F=\{+, -, \ast , /\}$, terminal set be $T=\{d_1 , d_{_2 } , \ldots, d_i , \ldots, d_m \}$, where $m$represents the number of security risk elements in CPPS. Then gene which is built according to rules and symbol in [25] is called risk assessment gene (RA-Gene), where head of RA-Gene $h$ is composed of elements of $F$ and $T$, and tail of RA-Gene $t$ is composed of elements of $T$. Moreover, the lengths of $h$ and $t$ follow the equation:

\begin{equation} \label{eq3} length(t)=length(h)\times (n-1)+1, \end{equation} (3)
where $n$ represents the maximum number of arguments of operator in the gene head.

SRACPPS-RGEP adopts linear code of fixed length[26] to represent an individual which is called a chromosome. In the meantime, a chromosome is composed of one or more RA-Genes.

B. Population of SRACPPS-RGEP

In evolution computation, diversity of population directly decides whether the optimal or suboptimal solutions can be obtained. At present, random method is applied to initial population in GEP. But diversity of population is limited. Reference [27] used gene space balance strategy to initialize population and enlarged diversity of initial population. However, the gene space balance strategy cannot dynamically increase diversity of population during the local convergence of GEP. Reference [28] used diversity-guided grading evolution strategy to change diversity of GEP population. But implementation of the algorithm was very complex. In [29], authors dynamically adjusted population size by self-adaptive coefficient. However, the algorithm cannot accurately reflect diversity of the current population. To solve these problems, adjustment of dynamic population based on variable-step (ADP-VS) is proposed in this paper. ADP-VS dynamically adjusts population size of GEP by variable step so as to better increase diversity of the current population and probability of searching the optimal solution.

The basic steps of ADP-VS are shown as follows:

$textbf{Algorithm 2.}$ ADP-VS ()

$\textbf{Input.}$ maximum generation $G_{\max} $, maximum step $S_{\max} $, maximum fitness $F_{\max} $;

$\textbf{Output.}$ NewPopulation;

Begin $\{$

1. ${\rm PopSize} = 500;\quad //$size of initial population

2. $\theta = 0;\alpha = 0.01;\beta = 0.05;\quad //$setting the initial value of related parameters

3. for $(i = 1;i < = {G_{\max }};i + + )\{ $

4. $\quad if(i\% 10 = = 0)\{ $

5. $\quad\quad$ if (maximum fitness value of population does not change in 10 generations){

6. $\quad \quad \quad \Delta = ({F_{\max }} - {F_{\max }}[i])/{F_{\max }};\quad //$computing error between the largest fitness value of current population and the real maximum fitness value

7. $\quad \quad \quad if(\Delta > \beta )\{ \theta = 200;{\rm{PopSize}} + = \theta ;{\rm{\} }}$

8. $\quad \quad \quad elseif(\alpha < \Delta < \beta )$
$\quad \quad \quad \quad \quad \quad \{ \theta = 100;{\rm{PopSize}} + = \theta ;\} $

9. ${\quad \quad \quad \quad \quad else\{ \theta = 0;\} }$
}}}}
End.

In Algorithm 2, $\theta $ represents step when population size changes, $\alpha $ and $\beta $ represent upper limit and lower limit of error between the maximum fitness value of current population and the real maximum fitness value respectively.

C. Description of SRACPPS-RGEP

To better describe SRACPPS-RGEP, the related concepts are given before risk assessment function mining algorithm is introduced.

$\textbf{Definition 9.}$ Let risk elements' set of CPPS be $S=\{\langle I, $ $V\rangle, \langle P, V\rangle, D\}$, and then $D=f(C_1 , \ldots, C_k , \ldots, C_{n+m} )$, $C$ $=$ $I\cup P$ is called risk assessment function (RA-Function).

$\textbf{Definition 10.}$ Let $F_{\rm optimal} $ be the optimal fitness value of risk assessment function based on GEP, $F_{\max} $ be maximum fitness value, then ${F_{\rm optimal} }/{F_{\max} }\times 100, \% $ is called risk assessment function mining success ratio (RAFMSR).

$\textbf{Definition 11.}$ In SRACPPS-RGEP, fitness function $f(i)$ of the $ i$-th individual is expressed by (3).

\begin{equation} \label{eq4} f_i =\sum\limits_{j=1}^n {(M-| V_{ij} -T_j | )}, \end{equation} (4)
where $M$ is threshold of absolute error and is confirmed by empiric value, $V_{ij} $ represents the value predicted by the $ i$-th individual program for fitness case $j$, $T_j $ is the target value for fitness case $j$.

Meanwhile, genetic operation of function mining based on GEP includes selection based on elitism, mutation, IS/RIS/Gene transposition, one-point/two-point/gene recombination etc.

The process of SRACPPS-RGEP is shown as follows.

$\textbf{Input.}$ $T=\langle U, C\cup D, V, f\rangle$, population size $\textit{PopSize}$, maximum iterations \textit{MaxGen}, maximum fitness value $\textit{MaxFitness}$, mutation probability $P_m $, transposition probability $P_t $, recombination probability $P_r $;

$\textbf{Output.}$ the optimal risk assessment function $\textit{BestRiskAss-Fun.}$

Begin {

1. Call FAR-BSA ($T$);\ \ //reduce security risk decision table

2. Initialization of Population $S;\quad //$initial population according to reduced decision table $T$

3. $gen=0$;

4. While ($fitness

5. $\quad$ GeneticOpertion $(S);\quad //$selection, mutation, transposition, and recombination operators are done

6. $\quad$ Calculating fitness value of every individual in population;

7. $\quad$ Keeping the best individuals;

8. $\quad fitness=Bestfitness;$

9. $\quad gen + + ;{\rm{\} }}$

10. return BestRiskAss-Fun.

D. Application Analysis of SRACPPS-RGEP

From introduction, it is known that security assessment or vulnerability assessment is considered from the point of physical system or information system in CPPS. SRACPPS-RGEP takes information system and physical system as a whole to consider security risk assessment. It is vital to safe and reliable operation of CPPS. Meanwhile, massive security risk logs obtained from CPPS are stored in existing power security defense system. Security risk model is extracted from these security risk logs by SRACPPS-RGEP. Finally, the model can be a basis of decision-making in safe and reliable operation of CPPS.

In order to better apply SRACPPS-RGEP to power security defense system, firstly, interface of security risk logs in CPPS is given; secondly, SRACPPS-RGEP analyzes security risk logs in CPPS by the interface, moreover, analysis and computing will be transplanted to distributed computing platform (such as cloud computing, etc.); lastly, results will be returned to power security defense system as a basis for adjustment of operation control strategy. The whole process is shown in Fig. 1.

Fig. 1 Application process of SRACPPS-RGEP.

In Fig. 1, SRACPPS-RGEP is deployed on the private cloud computing platform in power grid. The algorithm can analyze security risk log of CPPS in parallel by interface and return results to power security defense system.

IV. EXPERIMENTS AND ANALYSES

In order to better verify the feasibility and effectiveness of the algorithms in this paper, simulation experiments are done in a laboratory environment. Two experiments are given on the following platform: Intel i5 2.3, GHz + 2, G + Win7 + Java etc.

The experimental data is mainly from China Southern Grid Provincial Power Company. Firstly, security risks of cyber and physical system in the power company are analyzed. Then the corresponding security risk elements' set is obtained. Lastly, security risk decision table is built by the security risk elements' set. In the decision table, the number of condition attribution and decision attribution is 21 and 5 respectively. Decision attribution value includes higher, high, medium, low and lower. The number of data set is 20. After quantification and normalization, training data is composed of top 15 elements, and the remaining data is test data.

$\textbf{Experiment 1.}$ For the known security risk decision table for CPPS, change of the number of condition attribution based on FAR-BSA is shown in Table II. The optimal attribution reduction based on FAR-BSA, PCA, SVD, attribution reduction algorithm based on positive region (AR-PR), attribution reduction algorithm based on discernable matrix (AR-DM), GEP-ARRS[26] and TS-ARRS [30] is shown in Table III. Fig. 2 shows comparison of time-consumption in which the optimal attribution reduction is obtained respectively based on FAR-BSA, PCA, SVD, AR-PR, AR-DM, GEP-ARRS and TS-ARRS under the condition that the algorithm runs 5 times.

Table II
CHANGED OF THE NUMBER OF CONDITION ATTRIBUTION BASED ON FAR-BSA

Table III
COMPARISON OF AND OPTIMAL REDUCTION BASED ON SEVEN ATTRIBUTION REDUCTION ALGORITHMS

It is well known that the optimal attribution reduction is not unique for invariant condition attributions. From Table II, we know that FAR-BSA is effective for solving an optimal attribution reduction. After reduction, the number of condition attribution reduces to about 76.19 %. In all condition attribution, only five condition attributions including local power matching degree, small signal stability, voltage stability, operational error, and unauthorized use and abuse are retained so that complexity of function mining by GEP will be greatly reduced. Meanwhile, according to rough set theory, attribution reduction does not change the ability to decide risk level of CPPS.

From Table III, an optimal reduction based on FAR-BSA, AR-PR, AR-DM, GEP-ARRS and TS-ARRS includes five condition attributions. An optimal reduction based on PCA and SVD includes respectively seven and eight condition attributions, where the optimal reduction based on FAR-BSA, AR-PR, AR-DM, GEP-ARRS and TS-ARRS does not result in loss of the original decision information. While attribution reduction based on PCA and SVD will inevitably result in partial loss of the original decision information.

Fig. 2 Time-consumption for an optimal reduction based on the seven algorithms.

Fig. 2 shows that time-consumption for solving the optimal reduction based on FAR-BSA, AR-PR, AR-DM, GEP-ARRS and TS-ARRS is less than PCA and SVD. However, in comparison with AR-PR, AR-DM, GEP-ARRS and TS-ARRS, time-consuming of FAR-BSA reduces to about 38.08 %, 41.35 %, 46.20 %, and 49.62 %, respectively. This is mainly because that maximum time complexity of FAR-BSA is ${\rm O}(|U| \times \log _2^m )$ $(1\le m\le \left| C \right|)$, while time complexity of AR-PR, AR-DM, PCA, SVD, GEP-ARRS and TS-ARRS is about, ${\rm O}(| C| \times | U| )$, ${\rm O}(| C| \times \frac{| U| }{2})$, ${\rm O}(| C| ^3)$, ${\rm O}(| C| ^3)$, ${\rm O}(| C|$ $\times$ $| U| \times MaxGen)$, and ${\rm O}(| C| \times | U| \times n^2)$ respectively. Here $| U| $ represents the number of elements in collection $U$, $| C| $ represents the number of elements in collection $C$, $MaxGen$ represents the maximum generations of GEP and $n$ represents a number of variables in TS-ARRS. Moreover, due to equivalence of reduction, FAR-BSA only requires the solution of an optimal reduction. So, for security risk decision table having a large number of condition attributions in CPPS, FAR-BSA can not only solve the optimal attribute reduction, but also make time-consuming to a minimum in comparison with other attribution reduction algorithms.

$\textbf{Experiment 2.}$ On the basis of Experiment 1, for security risk decision table in CPPS after reduction, performance of SRACPPS-RGEP is described in Experiment 2. Parameters of GEP are shown in Table IV. Fig. 3 shows comparison of difference between optimal and maximum fitness value by GEP before and after reduction. Under the condition that SRACPPS-RGEP runs 5 times, comparison of average time-consuming before and after reduction is shown in Fig. 4. Fig. 5 describes error between maximum fitness value of current population and real maximum fitness value with increase of population size. Figs. 6 and 7 show respectively the comparison between model value by SRACPPS-RGEP and real value for training and test data.

Fig. 3 Comparison of difference between optimal and maximum fitness value before and after reduction.

Fig. 4 Comparison of average time-consumptionof GEP before and after reduction.

Fig. 5 Comparison of error between maximum fitness value of current population and real maximum fitness value.

Fig. 6 Comparison between the values of training data and model.

Fig. 7 Comparison between the values of test data and model.

From Fig. 3, we know that in comparison with difference between optimal and maximum fitness value before the reduction, difference after the reduction has been maximally decreased about 89.81 %. Function mining success ratio has been maximally reached about 99.86 %. This means that for high-dimensional data sets, attribution reduction greatly improves success probability of function mining without changing decision capability of existing data sets. Fig. 4 shows that attribution reduction greatly reduces average time-consuming of security risk assessment function model based on GEP for the same security risk decision table. Average time-consuming has been maximally dropped about 56.52 % under the same GEP parameters. This is mainly because that complexity of GEP population has been drastically simplified by reduction.

Optimal risk assessment function model by SRACPPS-RGEP is shown as (4).

\begin{align} \label{eq5} & f(a, b, c, d, e)=-a+c+b\times \csc(\sin(\tan(b+d)))\notag \\[1mm] &\qquad + b\times \frac{\sec(c+d-b\times \frac{e}{d})}{d}+a\times \tan(\frac{c}{d}+d) \end{align} (5)

From Fig. 5, we know that with the increase of population size, error between maximum fitness value of current population and real maximum fitness value gradually declines. This is because that adjustment strategy of dynamic population based on variable-step is applied to dynamically increase population size and diversity of individual, thereby increase solution of the optimal individual. Meanwhile, degree of fitness between real value and model value of train data and test data for security risk decision table is shown respectively in Figs. 6 and 7. From Fig. 6, we can see that maximum error between model value and real value of security risk train data by (5) is 0.7441, and minimum error is 0.004. Fig. 7 indicates that maximum error between model value and real value of test data by (5) is 0.3091, and minimum error is 0.0497. It can be seen that the model has high prediction accuracy.

Table IV
PARAMETERS OF GEP

For security risk assessment of CPPS, the higher prediction accuracy of security risk assessment function model based on SRACPPS-RGEP is, the higher accuracy of assessment for security risk level of CPPS is.

V. CONCLUSION

With deep integration of information systems and physical systems in smart grid, various types of security risks of information systems will affect the normal operation of the physical system. Security risks of cyber physical power system need to be timely found and evaluated because handling of these security risks are essential for safe and reliable operation of cyber physical power system. In this paper, for better and faster solution of optimal attribute reduction on security risk decision table for CPPS, fast attribution reduction based on binary search algorithm (FAR-BSA) is firstly proposed. On the basis of FAR-BSA, security risk assessment algorithm for cyber physical power system based on rough set and gene expression programming (SRACPPS-RGEP) is described. Experimental results show that SRACPPS-RGEP can quickly discover optimal security risk function model, and the function model has high predictive accuracy. This will provide good foundation for timely and accurate assessment and prediction of security risk in CPPS in the future.

The existing power information security defense system does not take security assessment of CPPS into account. In the future, the SRACPPS-RGEP can be used as an important part of existing power information security defense system, and provide comprehensive security situation assessment for power system of China from information region to production and control region.

References
[1] Zhao Jun-Hua, Wen Fu-Shuan, Xue Yu-Sheng, Li Xue, Dong Zhao-Yang. Cyber physical power systems: architecture, implementation techniques and challenges. Automation of Electric Power Systems, 2010, 34(16): 1-7 (in Chinese)
[2] Ju Ping, Qin Chuan, Huang Hua, Wu Feng, Jin Yu-Qing. Research trends of power system modeling geared to smart grid. Automation of Electric Power Systems, 2012, 36(11): 1-6 (in Chinese)
[3] Yu Yi-Xin, Luan Wen-Peng. Basic philosophy of smart grid. Journal of Tianjin University, 2011, 44(5): 377-384 (in Chinese)
[4] Vellaithurai C, Srivastava A, Zonouz S, Berthier R. CPINDEX: cyberphysical vulnerability assessment for power-grid infrastructures. IEEE Transactions on Smart Grid, 2015, 6(2): 566-575
[5] Hong J H, Liu C C, Govindarasu M. Integrated anomaly detection for cyber security of the substations. IEEE Transactions on Smart Grid, 2014, 5(4): 1643-1653
[6] Sridhar S, Govindarasu M. Model-based attack detection and mitigation for automatic generation control. IEEE Transactions on Smart Grid, 2014, 5(2): 580-591
[7] Zonouz S, Davis C M, Davis K R, Berthier R, Bobba R B, Sanders W H. SOCCA: a security-oriented cyber-physical contingency analysis in power infrastructures. IEEE Transactions on Smart Grid, 2014, 5(1): 3-13
[8] Hu J K, Pota H R, Guo S. Taxonomy of attacks for agent-based smart grids. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(7): 1886-1895
[9] Mei Sheng-Wei, Wang Ying-Ying, Chen Lai-Jun. Overviews and prospects of the cyber security of smart grid from the view of complex network theory. High Voltage Engineering, 2011, 37(3): 672-679 (in Chinese)
[10] Feng Deng-Guo, Zhang Yang, Zhang Yu-Qing. Survey of information security risk assessment. Journal of China Institute of Communications, 2004, 25(7): 10-18 (in Chinese)
[11] Guo Chuang-Xin, Yu Bin, Guo Jia, Wen Bo-Jian, Zhang Jin-Jiang, Zhang Li, Lu Hai-Bo, Li Bo. Security risk assessment of the IEC61850-based substation automation system. Proceedings of the CSEE, 2014, 34(4): 685-694 (in Chinese)
[12] Liu N, Zhang J H, Wu X. Asset analysis of risk assessment for IEC 61850-based power control systems, Part I: methodology. IEEE Transactions on Power Delivery, 2011, 26(2): 869-875
[13] Liu N, Zhang J H, Wu X. Asset analysis of risk assessment for IEC 61850-based power control systems, Part II: application in substation. IEEE Transactions on Power Delivery, 2011, 26(2): 876-881
[14] Bompard E, Gao C W, Napoli R, Russo A, Masera M, Stefanini A. Risk assessment of malicious attacks against power systems. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 2009, 39(5): 1074-1085
[15] Zhang Li, Yao Yi-Zhan, Peng Jian-Fen, Chen Hong-Bo, Du Yu-Ge. Intelligent information security risk assessment based on a decision tree algorithm. Journal of Tsinghua University (Science and Technology), 2011, 51(10): 1236-1239 (in Chinese)
[16] Zhao Dong-Mei, Liu Jin-Xing, Ma Jian-Feng. Risk assessment of information security based on improved wavelet neural network. Computer Science, 2010, 37(2): 90-93 (in Chinese)
[17] Zhao Gang, Wu Tian-Shui. Information security risk assessment based on G-ANP. Journal of Tsinghua University (Science and Technology), 2013, 53(12): 1761-1767 (in Chinese)
[18] Chang L Y, Lee Z J. Applying fuzzy expert system to information security risk assessment-a case study on an attendance system. In: Proceedings of the 2013 International Conference on Fuzzy Theory and Its Applications. Taipei, China: IEEE, 2013. 346-351
[19] Wang Bo. Research on Security Risk and Vulnerability Assessment Methods of Complicated Power System [Ph. D. dissertation], Huazhong University of Science and Technology, China, 2011. (in Chinese)
[20] Yin Lin-Zi, Yang Chun-Hua, Wang Xiao-Li, Gui Wei-Hua. An incremental algorithm for attribute reduction based on labeled discernibility matrix. Acta Automatica Sinica, 2014, 40(3): 397-404 (in Chinese)
[21] Jiang Yun-Liang, Yang Zhang-Xian, Liu Yong. Quick distribution reduction algorithm in inconsistent information system. Acta Automatica Sinica, 2012, 38(3): 382-388 (in Chinese)
[22] Pawlak Z. Rough sets. International Journal of Computer and Information Sciences, 1982, 11(5): 341-356
[23] Zeng Qing-Hong, Lu De-Tang. Curve and surface fitting based on moving least-squares methods. Journal of Engineering Graphics, 2004, 25(1): 84-89 (in Chinese)
[24] Koza J R. Genetic Programming II: Automatic Discovery of Reusable Programs. Cambridge: MIT Press, 1994. 110-199
[25] Ferreira, C. Gene expression programming: a new adaptive algorithm for solving problems. Complex Systems, 2001, 13(2): 87-129
[26] Deng S, Wang R C, Fu X, Yang L C. Gene expression programming for attribution reduction in rough set. International Journal of Computers and Applications, 2010, 32(2): 226-231
[27] Hu Jian-Jun, Tang Chang-Jie, Duan Lei, Zuo Jie, Peng Jing, Yuan Chang-An. The strategy for diversifying initial population of gene expression programming. Chinese Journal of Computers, 2007, 30(2): 305-310 (in Chinese)
[28] Liu Qi-Hong, Tang Chang-Jie, Hu Jian-Jun, Zeng Tao, Liu Ying-Tian, Qiu Jiang-Tao. Gene expression programming based on diversity-guided grading evolution. Journal of Sichuan University (Engineering Science Edition), 2006, 38(6): 108-113 (in Chinese)
[29] Deng Song, Wang Ru-Chuan. Classification of distributed GEP-BP based on grid service. Acta Electronica Sinica, 2009, 37(11): 2600-2603 (in Chinese)
[30] Header A -R, Wang J, Fukushima M. Tabu search for attribute reduction in rough set theory. Soft Computing, 2007, 12(9): 909-918