Spot

Spot identifies the processes in a pipeline that produce different results in different execution conditions.

Installation
Spot
How to Contribute
License

Installation

Simply install the package with pip

$ pip install spottool

Pre-requisites

Install and start Docker
Build Docker images for the pipelines in different conditions (e.g. Debian10 and CentOS7)
Create Boutiques descriptors for the pipeline, in each condition
Get provenance information using ReproZip tool in one condition

The auto_spot command finds processes that create differences in results obtained in different conditions and reports them in a JSON file.

Usage example

In this example, we run a bash script that calls the grep command multiple times, creating different output files when run on different OSes. We use spot to compare the outputs obtained in CentOS 7 and Debian 10.

The example can be run in this Git repository as follows:

git clone https://github.com/big-data-lab-team/spot.git
cd spot
pip install .

docker build . -f spot/example/centos7/Dockerfile -t spot_centos_latest
docker build . -f spot/example/debian/Dockerfile -t spot_debian_latest

cd spot/example 

auto_spot -d descriptor_centos7.json -i invocation_centos7.json -d2 descriptor_debian10.json -i2 invocation_debian10.json -s trace_test.sqlite3 -c conditions.txt -e exclude_items.txt -o commands.json .

In this command:

descriptor_<distro>.json is the Boutiques descriptor of the application executed in OS <distro>.
invocation_<distro>.json is the Boutiques invocation of the application executed in OS <distro>, containing the input files.
trace_test.sqlite3 is a ReproZip trace of the application, acquired in CentOS 7.
condition.txt contains the result folder for each condition.
exclude_items.txt contains the list of items to be ignored while parsing the files and directories.

The command produces the following outputs:

commands_captured_c.json contains the list of processes with temporary files and files written by multiple processes.
commands.json contains the list of processes that create differences in two conditions. Attribute total_commands_multi contains processes that write files written by multiple processes and total_commands contains the other processes.

How to Contribute

Clone repo and create a new branch: $ git checkout https://github.com/big-data-lab-team/spot -b name_for_new_branch.
Make changes and test
Submit Pull Request with comprehensive description of changes

reprotools
Release 0.0.2

Release 0.0.2

0.0.2

0.0.1

Documentation

Spot

Table of Contents

Installation

Pre-requisites

Usage example

How to Contribute

License

Stats

Development practices

Releases

Contributors

reprotools Release 0.0.2

Release 0.0.2 Toggle Dropdown 0.0.2 0.0.1

Documentation

Spot

Table of Contents

Installation

Pre-requisites

Usage example

How to Contribute

License

Stats

Development practices

Releases

Contributors

reprotools
Release 0.0.2

Release 0.0.2

0.0.2

0.0.1