In this paper, there is no unified grading standard for the harm of terrorist attacks. A classification model of terrorist incidents based on machine learning is proposed. First, the data related to the hazard in the Global Terrorism Database (GTD) is extracted and preprocessed. Secondly, the data is extracted by principal component analysis, and all events are aggregated into 5 by K-means clustering. Again, the entropy method is used to calculate the weighting coefficient of each indicator, and the comprehensive score of the hazard of each type of terrorist attack is calculated. Finally, the scores are divided into 1-5 levels of hazard grading models in order of high to low. The results show that the hazard grading model can scientifically and objectively quantify terrorist attacks.

A terrorist attack is an aggression committed by an extremist or organization that is not in conformity with international morality and is directed against, but not limited to, civilians and civilian installations. It not only has great destructiveness and destructive power, but also directly causes huge casualties and property losses. It also brings tremendous psychological pressure to people, causing a certain degree of turmoil in society and greatly hindering economic development. Global terrorism is a phenomenon of public interest, and everyone is directly affected by it. Therefore, anti-terrorism work is imminent. Big data is now the main source of counter-terrorism intelligence. The Global Terrorism Database (GTD) is the world’s most comprehensive database of non-confidential terrorist attacks, containing more than 180,000 terrorist attacks, each containing at least 45 variables. An in-depth analysis of data related to terrorist attacks will help deepen people’s understanding of terrorism and provide valuable information support for opposing terrorism and preventing terrorism. Data collection and preprocessing intelligence are the lifeblood of counter-terrorism work. Keeping reliable information in a timely manner can play an active role in combating terrorism and effectively curb the spread of terrorism[

Grading catastrophic events (such as earthquakes, traffic accidents, meteorological disasters, etc.) is an important task of social management. The usual grading generally adopts a subjective method, and the authority stipulates the grading standard. The harmfulness of terrorist attacks depends not only on the two aspects of casualties and economic losses, but also on the timing, geography, and targeted objects. Therefore, it is difficult to fully reflect these factors. The hazard grading of terrorist incidents can clearly define the future attacks, and different levels of events correspond to different treatments. This will not only help the management of social security, but also avoid unnecessary waste of manpower and property.

Combined with big data processing technology, this paper establishes a hierarchical model based on PCA algorithm, K-meas clustering algorithm and entropy method. First, 14 evaluation indicators related to the hazard of the event were selected to preprocess the existing data. Secondly, the PCA method was used to reduce the index from 14 dimensions to 4 dimensions, and the reduced dimension vector was obtained by the clustering algorithm. Gather into 5 categories, you can get the category corresponding to each event. Finally, using the entropy method to score the hazard of each event and according to the average hazard score of each class. According to the degree of harm from high to low levels 1 to 5. A hazard grading model of terrorism events is obtained with a hazard rating of 5.

In this paper, the hazard grading model of terrorism events data is established from some important fields of the GTD original database. The selected data handling requires missing value processing, conversion of characters to numeric values and numerical processing.

The Important field of hierarchical is pointed out by

THE SELECTED FIELD TABLE

Field | Description |
---|---|

extended | Whether it is a continuous event |

latitude | latitude |

longitude | longitude |

success | Successful attack |

suicide | Suicide attack |

nkill | Total number of deaths |

propextent | Degree of property damage |

nwound | Total number of injuries |

country | country |

region | area |

city | city |

attacktype | Attack type |

targtype | Target/victim type |

weapontype | Weapon type |

In the selected field, Python’s function

The character field that need to be converted is as follows:

In the original GTD database, the nkill field includes the number of all victims and terrorists who directly caused death from terrorist incidents. We use only requires the number of victims and does not require the death toll of terrorists. Therefore, the number of victims is obtained by subtracting the number of terrorist deaths (nkiller) from the total number of deaths.

In this paper, the PCA algorithm, K-means clustering algorithm and entropy method are used to classify the terrorist attacks. The process of building a hierarchical model is divided into four steps:

The 14 indicators with greater influence is standardized by PCA algorithm. We construct a 14-dimensional matrix, and then reduce the matrix from 14 dimensions to 4 dimensions.

The K-means algorithm is used to cluster all the terrorist events in the matrix into five major categories, i.e. five hazard levels.

Using the entropy weight method finds the weights of each of the 14 indicators, and then weighting and summing the 14 indicators of each event to obtain the score of the event. For each hazard level, finding the average score for all events is at that level.

Sorting by the average scores of the five hazard levels, We divide them into one to five grades from high to low. The higher score means the greater damage.

Principal Component Analysis (PCA) extracts M-dimensional feature matrices from N-dimensional matrices. First, we calculates eigenvalues and eigenvectors of N-dimensional matrices. According to the order of PCA eigenvalues from large to small, we select the corresponding first M eigenvectors., and then obtain an N*M feature transformation matrix T. In this paper,

The order of PCA eigenvalues generated by 14 indicators from large to small is shown in Table

THE STATE AND CITY ASSIGNMENT

Index | assignment |
---|---|

developed countries | 2 |

underdeveloped countries | 1 |

the capital | 3 |

the provincial capital | 2 |

other cities | 1 |

CHARACTERISTIC VALUES CORRESPONDING TO THE INDICATORS

Indicators | Characteristic values |
---|---|

nkill | 9.82022087e-01 |

nwound | 8.06184462e-02 |

targtype | 7.91122120e-03 |

country | 5.20872985e-02 |

attacktype | 4.84991077e-03 |

region | 4.01240379e-02 |

suicide | 2.66626688e+00 |

city | 2.60031933e-02 |

longitude | 1.84972981e+02 |

extended | 1.63936354e+03 |

latitude | 1.36725606e+03 |

propextent | 1.06560032e-01 |

success | 1.04574700e+02 |

weapontype | 0.00000001e+00 |

In this paper, 98686 data is reduced by the PCA algorithm, i.e. the original 14-dimensional matrix x = [x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}, x_{7}, x_{8}, x_{9}, x_{10}, x_{11}, x_{12}, x_{13}, x_{14}] is reduced to a 4-dimensional matrix Y = [y_{1}, y_{2}, y_{3}, y_{4}]. The corresponding contribution degrees of the 4-dimensional feature vectors are: 0.49, 0.42, 0.06, 0.03, and the sum is greater than 0.99. Therefore, the dimension-reduced matrix preserves most of the original data and can be directly used for clustering.

The main idea of the K-means clustering algorithm is to cluster a number of discrete data points with

We select 5 event objects as the initial cluster center.

We calculate the Euclidean distance from each event to each cluster center and assign this event to the nearest cluster.

After all the event assignments are completed, the five cluster centers are recalculated, and compared with the cluster center obtained in the previous calculation. If the cluster center changes, the Euclidean distance and the assigned category are recalculated.

When the cluster center does not change, the clustering result is directly output.

Calculate the cluster center to which each type of event belongs, as shown in Table

CLUSTERING CENTER FOR EVENT CLASSIFICATION

type | X1 | X2 | X3 | X4 | numbers |
---|---|---|---|---|---|

0 | 2.4843 | -16.3826 | -1.3464 | 0.3081 | 63122 |

1 | -3.3968 | 22.8297 | -3.8782 | 0.0615 | 37848 |

2 | 825.778 | 873.697 | 28.9316 | -104.59 | 2 |

3 | 13.8411 | -127.794 | 19.7789 | -2.7281 | 3500 |

4 | -9.5985 | 63.3898 | 16.7324 | -1.2382 | 9711 |

The formula for calculating each event category is as shown in Equations (

Among them is Y = [y_{1}, y_{2}, y_{3}, y_{4}] the feature component vector after dimension reduction by PCA algorithm. _{i}_{i}

The entropy method is a mathematical method used to determine the degree of dispersion of an indicator. With the great degree of dispersion comes great impact of the comprehensive evaluation of the indicator. The entropy value can be used to determine the degree of dispersion of an indicator. The steps of calculating the weight coefficient by the entropy method are as follows:

We select 14 indicators of 98686 events, and use _{ij}

Normalization of 14 indicators is Normalized processing. The absolute values of the 14 indicators are conversed into relative values. It has different representative meanings that the positive indicator and the negative indicator value (the higher the positive indicator value is the better), the lower the negative indicator value is the better), as shown in Equation (

Calculating the proportion of the i-th event in the j-th index are shown in Equation

Calculating the entropy value of the j-th indicator, are shown in Equation

Calculating the information entropy redundancy are shown in Equation

Calculating the weights of each indicator are shown in Equation

Calculating the hazard weighting value of each event are shown in Equation

The weighting factors for each indicator are shown in Table

WEIGHT COEFFICIENTS OF EACH INDICATOR

indicator | x1 | x2 | x3 | x4 | x5 | x6 | x7 |
---|---|---|---|---|---|---|---|

0.25 | 0.01 | 0.26 | 0.15 | 0.17 | 0.08 | 0.01 |

indicator | x8 | x9 | x10 | x11 | x12 | x13 | x14 |
---|---|---|---|---|---|---|---|

0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |

All events can be divided into five hazard levels by PCA and K-Means clustering. The hazard score of each event is obtained by entropy method, and the average value of the hazard score of each type of event is obtained. After sorting the average, the five hazard levels are shown in Table

HAZARD GRADING RESULT

Hazard level | Cluster category | Hazard level |
---|---|---|

1 | 2 | 1766.7104 |

2 | 3 | 3.2596 |

3 | 0 | 0.6239 |

4 | 4 | -2.6904 |

5 | 1 | -0.8788 |

In this paper, 14 categories related to hazard are selected from the Global Terrorism Database (GTD) for the hazard grading of terrorist attacks; after pre-processing the data used, through principal component analysis (PCA) The related data is used for feature extraction. The K-means clustering method aggregates all events into five categories. The entropy method calculates the weight coefficient of each indicator, and finally obtains the comprehensive score of the harm of each type of attack. According to the comprehensive scores of the five types of attacks, a graded to five-level classification model was obtained. This model quantifies the relevant data of past terrorist attacks, and the obtained model has objectivity. It is necessary to establish more detailed grading standards.