SpliDT Benchmarking Guide
This guide explains how to reproduce the evaluation results for SpliDT and compare it against baseline systems. The benchmarking pipeline evaluates accuracy, scalability, and hardware resource efficiency of partitioned decision tree inference.
📂 Benchmark Datasets
The benchmarking experiments use seven real-world network traffic datasets. The following are their detailed descriptions (more details in the paper). The datasets are provided by the Canadian Institute for Cybersecurity (CIC) and UC Santa Barbara (UCSB).
| Dataset | Description | Classes |
|---|---|---|
| CIC-IoMT2024 | A cybersecurity dataset with Internet of Medical Things (IoMT) traffic for intrusion detection in healthcare. | 19 |
| CIC-IoT2023-a | A simplified version of the CIC-IoT-2023 dataset, categorized into four primary classes of IoT traffic. | 4 |
| ISCX-VPN2016 | A dataset containing VPN and non-VPN traffic for evaluating VPN detection and privacy-related analyses. | 13 |
| CampusTraffic | UCSB campus dataset containing various application types, including web, cloud, social, and streaming traffic. | 11 |
| CIC-IoT2023-b | A comprehensive IoT dataset containing multi-class network traffic data for evaluating IoT security threats. | 32 |
| CIC-IDS2017 | A network intrusion detection dataset for various attack scenarios, including DoS, DDoS, and brute force. | 10 |
| CIC-IDS2018 | An anomaly detection dataset capturing network traffic for diverse attacks and benign activities. | 10 |
These datasets represent different traffic patterns and attack scenarios used for evaluating network intrusion detection systems.
🏁 Running SpliDT Benchmarks
Run the training pipeline for a dataset configuration:
python train.py --config configs/iscxvpn-2016-c13-bo.yml
python src/db_pusher.py --config configs/iscxvpn-2016-c13-bo.yml
📊 Running Baseline Models
To run the LEO baseline using a given dataset configuration file:
python src/leo.py --config configs/iscxvpn-2016-c13-bo.yml
Based on the LEO output printed to the console, update the LEO section in the configuration file for this dataset.
| configs/iscxvpn-2016-c13-bo.yml | |
|---|---|
46 47 48 49 50 51 52 53 54 55 56 | |
Similarly, to run the NetBeacon baseline:
python src/netbeacon.py --config configs/iscxvpn-2016-c13-bo.yml
| configs/iscxvpn-2016-c13-bo.yml | |
|---|---|
144 145 146 147 148 149 150 151 152 153 154 | |
For the single partition (unpartitioned) design, a SpliDT-based baseline can be evaluated as following:
python src/one_partition.py --config configs/iscxvpn-2016-c13-bo.yml
Based on the console output, update the one_partition section in the configuration file for this dataset.
| configs/iscxvpn-2016-c13-bo.yml | |
|---|---|
215 216 217 218 219 220 221 222 223 224 225 | |
Experiments and Microbenchmarks
After running training for each dataset, you can plot the results using the provided jupyter notebooks. Start by activating the custom R-ggplot conda environment as a docker container from a separate terminal:
make start-ggplot-docker
vscode or cursor to this docker container and once inside this container, open the directory located at /home/jovyan/work. For each of the notebooks, select the conda environment as kernel to run the code. Run plots/e2e-ggplot.ipynb Jupyter notebook to produce the end-to-end plots and plots/bm-ggplot.ipynb for the microbenmark results. Experiment Teardown
To stop the experiment and remove all installed components, run the following command:
make stop-dashboards
conda deactivate