Skip to content

TPs-ESIR-S9/PcapFileAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ESIR-S9 - AI Project : Network Traffic Analysis 🦈

Yazid BENJAMAA (@Xacone) & Thomas DELAPART (@Thomega35)

The project's purpose is to predict wether a network activiy is malicious or not, this classification is achieved by analysis each packet content and context in a network capture file (pcap) and then returning a brief of the attacks that were detected.

We also built a little Flask web app which has Google Colab as a backend that allows to visualiaze classification results, more details below.

Model and input format

We took the following model from HuggingFace : rdpahalavan/bert-network-packet-flow-header-payload which classifies a single network packet into one of these categories :

['Analysis', 'Backdoor', 'Bot', 'DDoS', 'DoS', 'DoS GoldenEye', 'DoS Hulk', 'DoS SlowHTTPTest', 'DoS Slowloris', 'Exploits', 'FTP Patator', 'Fuzzers', 'Generic', 'Heartbleed', 'Infiltration', 'Normal', 'Port Scan', 'Reconnaissance', 'SSH Patator', 'Shellcode', 'Web Attack - Brute Force', 'Web Attack - SQL Injection', 'Web Attack - XSS', 'Worms']

Each input represents a network packet which respects the following structure :

The model is based on BERT (Bidirectional Encoder Representations from Transformers) which is based on the Transformer Neural Network architecture. We appreciated the usage of BERT as it is suitable in the context of analyzing pcap files where bidirectional packets data contexts in the network flow is important.

Each IP packet in a a loaded pcap file is converted to the format before being processed by the model, pcap/packets manipulation is done using Scapy

Pcap files that were used for testing & fine-tuning the model were taken from the following sources, they provide a wide range of samples containing benign/malicious activities :
TII-SSRC-23 Dataset - Network traffic for intrusion detection research
Network datasets
Network Forensics and Network Security Monitoring - Publicly available PCAP files

Fine-tuning

There was many attempts to fine-tune and half of them provided satisfying results. When adding more training labeled samples, the model has been much more proficient in detecting the same attack or attacks of the same family (acting at the same TCP/IP layer) but it returned inconsistent and false results for the other attacks, either detecting nothing at all or a bunch of other attacks that had nothing to do with the content of the pcap file. Knowng that we have also filtered packets that were taken into account during training (e.g. only GET or POST requests for HTTP DoS attack samples).

The notebook provides the function trainFromPcapFile(file_path, label, application_filter) which allow to add transformed training samples (packets) retrieved from a pcap file + the ability to select packets based on filter patterns.

trainFromPcapFile("/content/sample_data/dvwa_sql_injection.pcap", 21, b"GET /") # Transforming and adding packets from the pcap file + labelize them with 21 (Web Attack - SQL Injection) + taking only GET requests.

We also tried to get rid of certain parameters such as the backward and forward packets (which seemed to us to be irrelevant in a normal packet capture sequence) which also ameloried the results of some attacks detection such as web attacks and port scans but which also proved to distort certain results We aren't able to provide a stable statement on the efficiency of fine-tuning , however we truly believe that more efforts and testings could lead to a more performant and balanced fine-tuned model.

Detecting Applicative (Layer 7) Denial of Service Attacks & Used Tools

The model does such a great job in detecting DoS attacks through the network.
Two HTTP Simple Denial of Service (DoS) tools were used to test its capabilites at detecting attacks that emanate from them : Hulk & GoldenEye .

Hulk attacks detection

20 ports TCP SYN Scannning (Assimiled to a "normal" activity)

We clearly see that the model has no problem to detect malicious anomaly flows in the network packets capture, he succeed to detect the anomaly type and the tool that was used with precision.

GoldenEye Attacks Detection

GoldenEye Attacks Detection after Fine-Tuning the Model

Applying the described methods above for fine-tuning allowed to retrieve more relevant & explicit results when analyzing a network traffic that suffered a DoS attack. Compared to the model state before fine-tuning, we see that more DoS-related packets were detected (500 malicious packets out of 1600 w/ fine-tuning VS 60 malicious packets out of 3000 by default)

You could also experiment it by using your own pcap samples or the ones that are provided in this repository.

Example of Illogical Results Caused by Fine-Tuning

We are trying here to detect DoS that were done by a tool which name is Slowloris. Logically, the model should predict that there's a lot of packets assimilated to DoS/DDoS or predict that they're normal if he failed. The model predicted the presence of a completely unrelated attack which is SSH brute-force with the Patator tool even though there is no communication to TCP port 22.

How to set up the app on Google Collab

Once executed, the following cell will print a link which will be routing to the app :

from google.colab.output import eval_js
print(eval_js("google.colab.kernel.proxyPort(5000)"))

Then execute the next cell that will fire up the backend, it is a flask-based application w/ two endpoints

[....]

@app.route("/")
def home():
    return index

@app.route('/upload', methods=['POST'])
def upload_file():

[...]

if __name__ == "__main__":
    app.run()
[....]

Then click the link, you should land on the app.
The given notebook allows to use Colab's default GPU w/ Pytorch in order to make trainings/predictions faster :

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)

To conclude

Working on this small project has been fun and instructive, and even if it's only a POC in the end, this project and its model can be applied and show their usefulness to many practical cases dealing with detectability in computer networks.

About

Malicious Network Traffic Analysis with AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published