attpc_spyral

AT-TPC analysis pipeline


License
Other
Install
pip install attpc_spyral==0.9.0rc0

Documentation

Spyral

Spyral is an analysis application for data from the Active Target Time Projection Chamber (AT-TPC). Spyral provides a flexible analysis pipeline, transforming the raw trace data into physical observables over several tunable steps. Sypral can process multiple data files in parallel, allowing for scalable performance over larger experiment datasets.

Installation

Download

To download the repository use git clone https://github.com/attpc/Spyral.git

To install the required packages it is recommended to create a virtual environment with python/pip, detailed below.

Pip

Create a virtual environment using

python -m venv </some/path/to/your/new/environment>

Activate the environment using source </some/path/to/your/new/environment/>/bin/activate, then install all required dependencies using

pip install -r requirements.txt

All dependencies for Spyral will then be installed to your virtual environment

Requirements

Python >= 3.10, < 3.13

Spyral aims to be cross platform and to support Linux, MacOS, and Windows. Currently Spyral has been tested and confirmed on MacOS and Ubuntu 22.04 Linux. Other platforms are not guaranteed to work; if there is a problem please make an issue on the GitHub page, and it will be resolved as quickly as possible.

Usage

Configuration

User configuration parameters are passed through JSON files. Configuration files are passed at runtime to the script.

Configurations contain many parameters. These can be seen in the config.json example given with the repo. These parameters are grouped by the use case:

  • Workspace parameters: These are file paths to either raw data, the workspace, or various AT-TPC pad data files.
  • Run parameters: Run numbers over which the data should be processed, as well as indications of which types of analysis to run
  • Detector parameters: detector conditions and configuration
  • GET parameters: parameters which are used in the peak identification and baseline removal analysis for the GET data (AT-TPC pads)
  • FRIB trace parameters: parameters used in the peak identification of FRIBDAQ signals (ion chamber, auxilary silicon, etc)
  • Clustering parameters: point cloud clustering parameters
  • Estimation parameters: used to generate estimates of physical observables
  • Solver parameters: used to control the physics solver

Running

To use Spyral, run the main.py script located at the top level of the repository with the virtual environment activated. Example:

python main.py CONFIG

Replace CONFIG with the path to your configuration file. For complete list of options use

python main.py --help

Performance

Spyral attempts to be as performant as possible while also being flexible enough to handle the broad sea of data that is generated by the AT-TPC. To that end, below are some useful tips on extracting the most performance out of the application.

  • The point cloud phase (first phase of the analysis) is by far the most time consuming task by our benchmarks. Some of the bottleneck is the I/O on the raw traces; raw trace files range in size from 10 GB to 50 GB, and an event can be several MB on it's own. As such it is highly recommended to store the trace data on a SSD rather than an HDD. Additionally, when possible, it is also recommended to store the data on a local disk (i.e. SATA or NVME/PCIe). USB connected removable drives can represent serious bottlenecks to this part of the analysis.
  • The clustering phase is entirely limited by the clustering algorithm chosen. As such there is little that can be done to improve the performance of this section. In general, clustering is the second most time consuming task, but is still much faster than generating the point clouds (typically a factor of 2).
  • The estimating phase is very performant due to the relative simplicity of the analysis. In general estimating should not be considered expensive
  • For the solving phase the story is complicated and will be described in more detail below.

Solving

The final phase of the analysis involves using the equations of motion of a charged ion in a electromagnetic field to extract physics parameters. As it might sound, this isn't that straight forward. The simple approach is to fit ODE solutions to the data, but it can prove quite expensive to solve the ODE's the hundreds of times it takes to minimize per event. To bypass this expense, Spyral pre-calculates many of these ODE solutions and then interpolates on them to find a best fit. To make this even faster, Numba is used to just-in-time compile a lot of the interpolation code. As such, the first time you run phase 4, it might take a while because Spyral is generating the interpolation scheme. But after that it will be really fast!

An alternative approach, the Unscented Kalman Filter, also exists. But this approach is not sound yet; more testing and development needs to be done before this method is ready to be used in production.

Parallel Processing

As was mentioned previously, Spyral is capable of running multiple data files in parallel. This is acheived through the python multiprocessing library. In the configuration file, there is a parameter named n_processors. The value of this parameter indicates to Spyral the maximum number of processors which can be spawned. Spyral will then inspect the data load that was submitted in the configuration and attempt to balance the load across the processors as equally as possible.

Some notes about parallel processing:

  • In job environments (SLURM, etc.), you won't want to have the typical progress display provided by Spyral. Disable terminal output using the --no-term flag (i.e. python main.py --no-term CONFIG).
  • The number of processors should not exceed the number of physical cores in the system being used MINUS one (the extra one is the parent process which is monitoring the children). Doing so could result in extreme slow down and potential unresponsive behavior.
  • In general, it is best if the number of data files to be processed is evenly divisible by the number of processors. Otherwise, by necessity, the work load will be uneven across the processors.
  • Spyral will sometimes run fewer processes than requested. This is usually in the case where the number of requested processors is greater than the number of files to be processed.

Logs and Output

Spyral creates a set of logfiles when it is run (located in the log directory of the workspace). These logfiles can contain critical information describing the state of Spyral. In particular, if Spyral has a crash, the logfiles can be useful for determining what went wrong. A logfile is created for each process (including the parent process). The files are labeld by process number (or as parent in the case of the parent).

By default, Spyral prints some basic information to the terminal and provides progress monitoring in the form of a progress bar for each processor. This can be disabled by passing the --no-term option

python main.py --no-term CONFIG

Notebooks

The notebook directory of Spyral contains several useful Jupyter notebooks for data visualization, including making particle ID gates.