Installation Guide
This guide describes how to install and run SpliDT, a partitioned decision tree training and inference framework designed for scalable, line-rate deployment.
โ๏ธ System Requirements
SpliDT is designed to be lightweight and portable. The system dependencies for the training framework are:
- Linux (tested on Ubuntu 20.04 / 22.04)
- Conda (Miniconda or Anaconda)
- Memory (16GB recommended)
- Disk (30โ50 GB for datasets)
- Docker
๐ฆ Installation Dependencies
Install Conda (if not installed)
If Conda is not installed, install Miniconda:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
๐งช Create the SpliDT Environment
Navigate to the repository and create the environment. Clone the dse-and-training-framework repository as following:
git clone https://github.com/SpliDT-Decision-Trees/dse-and-training-framework.git
conda env create -f environment.yml
conda activate splidt
This installs the required packages including:
- scikit-learn
- PyTorch
- HyperMapper
- dataset processing utilities
- experiment logging tools
๐ Dataset Setup
To evaluate SpliDT, we release seven real-world datasets that are used for training and evaluating partitioned decision trees.
- CIC-IDS-2017
- CIC-IDS-2018
- CIC-IoT-2023-a
- CIC-IoT-2023-b
- CIC-IoMT-2024
- ISCX-VPN-2016
- CampusTraffic
Download
The datasets used to train and evaluate SpliDT are publicly available via FigShare. Download and place them on your machine as following:
Step 1: create directory
mkdir ~/splidt
cd ~/splidt
Step 2: download the traffic traces and pipelines
# download the Datasets for training and testing
curl -L -H "User-Agent: Mozilla/5.0" -OJ "https://ndownloader.figshare.com/files/60570641"
Step 3: unzip the downloaded files
unzip splidt-dataset.zip
Step 4: Modify the path field in each dataset configuration file under configs/ to reference the absolute path of the unzipped dataset directory (e.g., /home/splidt/splidt-dataset).
path: "/home/splidt/splidt-dataset"
๐ฏ HyperMapper Setup
Clone the HyperMapper repository, then modify the script_path field in each dataset configuration file under configs/ to reference the absolute path of the hypermapper.
script_path: "/home/splidt/hypermapper/scripts/hypermapper.py"
Now, we are ready to train and test our datasets.
๐ Install Visualization Stack (Grafana)
Before starting training, navigate to the dse-and-training-framework repository and run the following command to launch the Grafana dashboards:
make start-dashboards
๐ง Model Training
Inside the dse-and-training-framework repository, run training using the following command. You can change the configuration file argument to select the desired dataset provided under the configs/ folder.
python src/train.py --config configs/{dataset_name}.yml
Next Steps
To evaluate SpliDT performance against baselines on real-world security datasets, please refer to our benchmarking guide.