Confusion Matrix and Cyber Crime

Prabhat Kumar
6 min readJun 5, 2021

--

What is cybercrime?

Cybercrime is a criminal activity that either targets or uses a computer, a computer network, or a networked device.

Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations.

Some cybercriminals are organized, use advanced techniques, and are highly technically skilled. Others are novice hackers.

Rarely, cybercrime aims to damage computers for reasons other than profit. These could be political or personal.

Types of cybercrime

Here are some specific examples of the different types of cybercrime:

  • Email and internet fraud.
  • Identity fraud (where personal information is stolen and used).
  • Theft of financial or card payment data.
  • Theft and sale of corporate data.
  • Cyberextortion (demanding money to prevent a threatened attack).
  • Ransomware attacks (a type of cyber extortion).
  • Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
  • Cyberespionage (where hackers access government or company data).

Most cybercrime falls under two main categories:

  • Criminal activity that targets
  • Criminal activity that uses computers to commit other crimes.

Cybercrime that targets computers often involves viruses and other types of malware.

Cybercriminals may infect computers with viruses and malware to damage devices or stop them from working. They may also use malware to delete or steal data.

Cybercrime that stops users from using a machine or network, or prevents a business from providing a software service to its customers, is called a Denial-of-Service (DoS) attack.

Cybercrime that uses computers to commit other crimes may involve using computers or networks to spread malware, illegal information, or illegal images.

Sometimes cyber criminals conduct both categories of cybercrime at once. They may target computers with viruses first. Then, use them to spread malware to other machines or throughout a network.

Cybercriminals may also carry out what is known as a Distributed-Denial-of-Service (DDoS) attack. This is similar to a DoS attack but cybercriminals use numerous compromised computers to carry it out.

The US Department of Justice recognizes the third category of cybercrime which is where a computer is used as an accessory to the crime. An example of this is using a computer to store stolen data.

The US has signed the European Convention of Cybercrime. The convention casts a wide net and there are numerous malicious computer-related crimes which it considers cybercrime. For example:

  • Illegally intercepting or stealing data.
  • Interfering with systems in a way that compromises a network.
  • Infringing copyright.
  • Illegal gambling.
  • Selling illegal items online.
  • Soliciting, producing or possessing child pornography.

Confusion matrix

A confusion matrix is a table that is used to determine the performance of a classification model. We compare the predicted values for test data with the true values known to us. By this, we know how many cases are classified correctly and how many are classified incorrectly. The table below shows the structure of the confusion matrix.

Let’s understand the terms used here:

  • In two-class problem, such as attack state, we assign the event normal as “positive” and anomaly as “negative“.
  • True Positive” for correctly predicted event values.
  • False Positive” for incorrectly predicted event values.
  • True Negative” for correctly predicted no-event values.
  • False Negative” for incorrectly predicted no-event values.

Confusion matrices have two types of errors: Type I and Type II

Now lets see these terms and their significance under the light of cyber attack prediction for better understanding.

IDS or Intrusion Detection System checks for any malicious activity on the system. It monitors the packets coming over internet using some ML model and predicts whether it is normal or an anomaly.

Lets say our model created the following confusion matrix for total of 165 packets it examined.

A total of 165 packets were analyzed by our model in IDS system which have been classified in the above confusion matrix.

  • Positive” -> Model predicted no attack.
  • Negative” -> Model predicted attack.
  • True Negative: Out of 55 times for which model predicted attack will take place, 50 predictions were ‘True’ which means 50 times attack actually took place. Due to prediction, Security Operations Centre (SOC) will receive notification and can prevent the attack.
  • False Negative: Out of 55 times for which model predicted attack will take place, 5 times the attack didn’t happen. This can be considered as “False Alarm” and also Type II error.
  • True Positive: The model predicted 110 times that attack wouldn’t take place, out of which 100 times actually no attack happened. These are the correct predictions.
  • False Positive: 10 times the attack actually took place when the model had predicted that no attack will happen. It is also called as Type I error.

Type I error:

This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case, no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aims of the model is to minimize this value.

Type II error:

This type of error are not very dangerous as our system is protected in reality but model predicted an attack. the team would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm.

We can use confusion matrix to calculate various metrics:

  1. Accuracy: The values of confusion matrix are used to calculate the accuracy of the model. It is the ratio of all correct predictions to overall predictions (total values)

Accuracy = (TP + TN)/(TP + TN + FP + FN)

2. Precision: (True positives / Predicted positives) = TP / TP + FP

3. Recall: (True positives / all actual positives) = TP / TP + FN

4. Specificity: (True negatives / all actual negatives) =TN / TN + FP

5. Misclassification: (all incorrect / all) = FP + FN / TP + TN + FP + FN

It can also be calculated as -> 1-Accuracy

!! Thank you for reading !!

--

--