auroris

Data Curation in Polaris


License
Apache-2.0
Install
pip install auroris==0.1.3

Documentation

Auroris

PyPI Conda PyPI - Downloads Conda PyPI - Python Version

test release code-check doc

Tools for data curation in the Polaris ecosystem.

Getting started

from auroris.curation import Curator
from auroris.curation.actions import MoleculeCuration, OutlierDetection, Discretization

# Define the curation workflow
curator = Curator(
    steps=[
        MoleculeCuration(input_column="smiles"),
        OutlierDetection(method="zscore", columns=["SOL"]),
        Discretization(input_column="SOL", thresholds=[-3]),
    ],
    parallelized_kwargs = { "n_jobs": -1 }
)

# Run the curation
dataset, report = curator(dataset)

Run curation with command line

A Curator object is serializable, so you can save it to and load it from a JSON file to reproduce the curation.

auroris [config_file] [destination] --dataset-path [data_path]

Documentation

Please refer to the documentation, which contains tutorials for getting started with auroris and detailed descriptions of the functions provided.

Installation

You can install auroris using conda/mamba/micromamba:

conda install -c conda-forge auroris

You can also use pip:

pip install auroris

Development lifecycle

Setup dev environment

conda env create -n auroris -f env.yml
conda activate auroris

pip install --no-deps -e .
Other installation options
Alternatively, using [uv](https://github.com/astral-sh/uv):
```shell
uv venv -p 3.12 auroris
source .venv/auroris/bin/activate
uv pip compile pyproject.toml -o requirements.txt --all-extras
uv pip install -r requirements.txt 
```   

Tests

You can run tests locally with:

pytest

License

Under the Apache-2.0 license. See LICENSE.