Web Analytics
Skip to content

Installation Guide

This guide describes how to install and run SpliDT, a partitioned decision tree training and inference framework designed for scalable, line-rate deployment.

โš™๏ธ System Requirements

SpliDT is designed to be lightweight and portable. The system dependencies for the training framework are:

  • Linux (tested on Ubuntu 20.04 / 22.04)
  • Conda (Miniconda or Anaconda)
  • Memory (16GB recommended)
  • Disk (30โ€“50 GB for datasets)
  • Docker

๐Ÿ“ฆ Installation Dependencies

Install Conda (if not installed)

If Conda is not installed, install Miniconda:

bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

๐Ÿงช Create the SpliDT Environment

Navigate to the repository and create the environment. Clone the dse-and-training-framework repository as following:

shell
git clone https://github.com/SpliDT-Decision-Trees/dse-and-training-framework.git
Create the environment and activate:

bash
conda env create -f environment.yml
conda activate splidt

This installs the required packages including:

  • scikit-learn
  • PyTorch
  • HyperMapper
  • dataset processing utilities
  • experiment logging tools

๐Ÿ“‚ Dataset Setup

To evaluate SpliDT, we release seven real-world datasets that are used for training and evaluating partitioned decision trees.

  • CIC-IDS-2017
  • CIC-IDS-2018
  • CIC-IoT-2023-a
  • CIC-IoT-2023-b
  • CIC-IoMT-2024
  • ISCX-VPN-2016
  • CampusTraffic

Download

The datasets used to train and evaluate SpliDT are publicly available via FigShare. Download and place them on your machine as following:

Step 1: create directory

shell
mkdir ~/splidt
cd ~/splidt

Step 2: download the traffic traces and pipelines

shell
# download the Datasets for training and testing
curl -L -H "User-Agent: Mozilla/5.0" -OJ "https://ndownloader.figshare.com/files/60570641"

Step 3: unzip the downloaded files

shell
unzip splidt-dataset.zip

Step 4: Modify the path field in each dataset configuration file under configs/ to reference the absolute path of the unzipped dataset directory (e.g., /home/splidt/splidt-dataset).

shell
path: "/home/splidt/splidt-dataset"

๐ŸŽฏ HyperMapper Setup

Clone the HyperMapper repository, then modify the script_path field in each dataset configuration file under configs/ to reference the absolute path of the hypermapper.

shell
script_path: "/home/splidt/hypermapper/scripts/hypermapper.py"

Now, we are ready to train and test our datasets.

๐Ÿ“Š Install Visualization Stack (Grafana)

Before starting training, navigate to the dse-and-training-framework repository and run the following command to launch the Grafana dashboards:

shell
make start-dashboards

๐Ÿง  Model Training

Inside the dse-and-training-framework repository, run training using the following command. You can change the configuration file argument to select the desired dataset provided under the configs/ folder.

shell
python src/train.py --config configs/{dataset_name}.yml

Next Steps

To evaluate SpliDT performance against baselines on real-world security datasets, please refer to our benchmarking guide.

Additional Documentation