Installation Guide

This guide describes how to install and run SpliDT, a partitioned decision tree training and inference framework designed for scalable, line-rate deployment.

⚙️ System Requirements

SpliDT is designed to be lightweight and portable. The system dependencies for the training framework are:

Linux (tested on Ubuntu 20.04 / 22.04)
Conda (Miniconda or Anaconda)
Memory (16GB recommended)
Disk (30–50 GB for datasets)
Docker

📦 Installation Dependencies

Install Conda (if not installed)

If Conda is not installed, install Miniconda:

bash

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

🧪 Create the SpliDT Environment

Navigate to the repository and create the environment. Clone the dse-and-training-framework repository as following:

shell

git clone https://github.com/SpliDT-Decision-Trees/dse-and-training-framework.git

Create the environment and activate:

bash

conda env create -f environment.yml
conda activate splidt

This installs the required packages including:

scikit-learn
PyTorch
HyperMapper
dataset processing utilities
experiment logging tools

📂 Dataset Setup

To evaluate SpliDT, we release seven real-world datasets that are used for training and evaluating partitioned decision trees.

CIC-IDS-2017
CIC-IDS-2018
CIC-IoT-2023-a
CIC-IoT-2023-b
CIC-IoMT-2024
ISCX-VPN-2016
CampusTraffic

Download

The datasets used to train and evaluate SpliDT are publicly available via FigShare. Download and place them on your machine as following:

Step 1: create directory

shell

mkdir ~/splidt
cd ~/splidt

Step 2: download the traffic traces and pipelines

shell

# download the Datasets for training and testing
curl -L -H "User-Agent: Mozilla/5.0" -OJ "https://ndownloader.figshare.com/files/60570641"

Step 3: unzip the downloaded files

shell

unzip splidt-dataset.zip

Step 4: Modify the path field in each dataset configuration file under configs/ to reference the absolute path of the unzipped dataset directory (e.g., /home/splidt/splidt-dataset).

shell

path: "/home/splidt/splidt-dataset"

🎯 HyperMapper Setup

Clone the HyperMapper repository, then modify the script_path field in each dataset configuration file under configs/ to reference the absolute path of the hypermapper.

shell

script_path: "/home/splidt/hypermapper/scripts/hypermapper.py"

Now, we are ready to train and test our datasets.

📊 Install Visualization Stack (Grafana)

Before starting training, navigate to the dse-and-training-framework repository and run the following command to launch the Grafana dashboards:

shell

make start-dashboards

🧠 Model Training

Inside the dse-and-training-framework repository, run training using the following command. You can change the configuration file argument to select the desired dataset provided under the configs/ folder.

shell

python src/train.py --config configs/{dataset_name}.yml

Next Steps

To evaluate SpliDT performance against baselines on real-world security datasets, please refer to our benchmarking guide.

Installation Guide

⚙️ System Requirements

📦 Installation Dependencies

🧪 Create the SpliDT Environment

📂 Dataset Setup

Download

🎯 HyperMapper Setup

📊 Install Visualization Stack (Grafana)

🧠 Model Training

Next Steps

Additional Documentation