This repository contains the code and text of the paper
Bach, Jakob, Markus Iser, and Klemens Böhm. "A Comprehensive Study of k-Portfolios of Recent SAT Solvers"
published at SAT 2022
.
The corresponding complete experimental data (inputs as well as results) are available on KITopenData.
This document provides:
- an outline of the repo structure
- guidelines for developers who want to modify or extend the code base
- steps to reproduce the experiments, including setting up a virtual environment
The repo contains three folders: one with the text of the paper, one with the conference presentation, and one with the Python code.
The folder code/
contains four Python files and three non-code files.
The non-code files are:
-
.gitignore
: For Python development. -
LICENSE
: The code is MIT-licensed, so feel free to use it. The files intext/
do not fall under this license. -
requirements.txt
: The necessary Python dependencies for our virtual environment; see below for details.
The code files are:
-
prediction.py
: Prediction models and functions to make + evaluate predictions. -
prepare_dataset.py
: First stage of the experiments (download and pre-process data) and functions for data handling. -
run_evaluation.py
: Third stage of the experiments (compute statistics and create plots for the paper). -
run_experiments.py
: Second stage of the experiments (search for portfolios and make predictions).
Additionally, we have organized the portfolio-search methods for our experiments as the standalone Python package kpsearch
,
located in the directory code/kpsearch_package/
.
See the corresponding README for more information.
Possible points for modifying and extending our code are the prediction approaches, search approaches, and dataset integration.
If you want to change the prediction models (or their parametrization) used in the experiments,
modify the variable MODELS
in prediction.py
.
Any other changes to the prediction procedure (e.g., target, imputation, evaluation metrics, etc.)
should be made in the same file as well.
If you want to develop another portfolio-search method,
have a look at the package README of kpsearch
for details on how the search method's interface should look like.
If you want to change which search routines are used in the experiments or how they are configured,
have a look at the function define_experimental_design()
in run_experiments.py
.
If you want to analyze further datasets, you should bring them into a format similar to the one
produced by prepare_dataset.py
.
In particular, you should store the dataset in the form of two CSVs:
-
{dataset_name}_runtimes.csv
should contain the solver runtimes as floats or ints. Each row is an instance, and each column is a solver (column names are solver names). Additionally, columnhash
should identify the instance. It is not used for search and predictions. -
{dataset_name}_features.csv
should contain the instance features as floats or ints. Each row is an instance, and each column is a feature (column names are feature names). Additionally, columnhash
should identify the instance. It is not used for search and predictions. Please encode categorical features beforehand (e.g., via one-hot encoding). Before prediction, we will replace missing values with a constant outside the feature's range.
Having prepared the dataset(s) in this format, you need to adapt the function run_experiments()
in
run_experiments.py
to use your datasets instead of / beside the SAT Competition ones.
The evaluation code in run_evaluation.py
also partly contains hard-coded dataset names,
but the evaluation functionality itself should work with results from arbitrary datasets.
Before running scripts to reproduce the experiments, you need to set up an environment with all necessary dependencies. Our code is implemented in Python (version 3.8; other versions, including lower ones, might also work).
If you use conda
, you can directly install the correct Python version into a new conda
environment and activate the environment as follows:
conda create --name <conda-env-name> python=3.8
conda activate <conda-env-name>
Choose <conda-env-name>
as you like.
To leave the environment, run
conda deactivate
We used virtualenv
(version 20.4.7; other versions might also work)
to create an environment for our experiments.
First, you need to install the correct Python version yourself.
Let's assume the Python executable is located at <path/to/python>
.
Next, you install virtualenv
with
python -m pip install virtualenv==20.4.7
To set up an environment with virtualenv
, run
python -m virtualenv -p <path/to/python> <path/to/env/destination>
Choose <path/to/env/destination>
as you like.
Activate the environment in Linux with
source <path/to/env/destination>/bin/activate
Activate the environment in Windows (note the back-slashes) with
<path\to\env\destination>\Scripts\activate
To leave the environment, run
deactivate
After activating the environment, you can use python
and pip
as usual.
To install all necessary dependencies for this repo, switch to the directory code
and run
python -m pip install -r requirements.txt
If you make changes to the environment and you want to persist them, run
python -m pip freeze > requirements.txt
After setting up and activating an environment, you are ready to run the code.
From the directory code
, run
python -m prepare_dataset
to download and pre-process instance features as well as SAT Competition runtimes from the GBD website. Next, start the pipeline with
python -m run_experiments
Depending on your hardware, this might take several hours or even days. (It took about 25 hours on our server with 32 CPU cores and a base clock of 2.0 GHz.) To create the plots for the paper and print some statistics to the console, run
python -m run_evaluation
All scripts have a few command-line options, which you can see by running the scripts like
python -m prepare_dataset --help