rsdb-utils

Utils to process the Regime Shifts DataBase CSV and parquet files


Keywords
Regime, Shifts, DataBase, RSDB, Ecosystems, Ecology, Social-ecological, systems, Data, analysis, Scientific, research
License
Other
Install
pip install rsdb-utils==0.1.post2

Documentation

RSDB utils

Utils to process the Regime Shifts DataBase CSV and parquet files.

Functionalities:

  • Import the database from a CSV or parquet file in a pandas DataFrame
  • Save the database in a CSV or parquet file from a pandas DataFrame
  • Check the database (a DataFrame) with the json schema or the Regime Shifts DataBase
  • From the JSON schema, generate a CSV with the list of enums values (tests/list_of_enums_from_json_schema)

Code repository: https://github.com/regimeshifts/rsdb-utils

Licence: GNU General Public License v3 (GPLv3)

Installation

From Python

In a terminal:

pip install rsdb-utils 

From R

In the R terminal:

# use reticulate
library(reticulate)
# install the rsdb-utils library
reticulate::py_install("rsdb-utils")

Usage

Read a database file

Read a database file to import it in a pandas DataFrame:

Python

from rsdb_utils import read_rsdb

df = read_rsdb("my_database.csv")

R

library(reticulate)
rsdb_utils <- import("rsdb_utils")

rsdb_utils$read_rsdb("my_database.csv")

Read and write a database file

Read a database file in parquet, import it in a pandas DataFrame, and save it in CSV

Python

from rsdb_utils import read_rsdb, write_rsdb

df = read_rsdb("my_database.parquet")

write_rsdb(df, "my_database.csv")

R

library(reticulate)
rsdb_utils <- import("rsdb_utils")

df <- read_rsdb$read_rsdb("my_database.csv")

read_rsdb$write_rsdb(df, "my_database.csv")

Check a database

Check a database based on the JSON schema of the Regime Shifts Database:

Python

from rsdb_utils import read_rsdb, write_rsdb, check_rsdb

df = read_rsdb("my_database.parquet")

df = check_rsdb(df)

# save the database with the errors info
write_rsdb(df, "my_database.csv")

R

library(reticulate)
rsdb_utils <- import("rsdb_utils")

df <- read_rsdb$read_rsdb("my_database.csv")

df <- read_rsdb$check_rsdb("my_database.csv")

read_rsdb$write_rsdb(df, "my_database.csv")

Generate the enums lists

Python

from rsdb_utils import generate_enums_dataframe

df = generate_enums_dataframe()
df.to_csv("list_of_enums_from_json_schema.csv")

Developer section

Python local installation

It's advised to create a virtual environment to install the required packages during development. You can then "install" this package (rsdb-utils) as a local package to make it easier to import within this local virtual environment.

# create a virtual environment
python3 -m venv .venv
# load the virtual environment
source .venv/bin/activate
# install the requirements
pip install -r requirements.txt
# install rsdb-utils as a local package
pip install -e .

R local installation

You can install the package from source with reticulate in R. Use the instructions above in Installation from R and swap the line reticulate::virtualenv_install("rsdb-utils") by:

reticulate::virtualenv_install("rsdb-utils", "/path/to/source/rsdb-utils")

Note that the editable installation ('pip install -e') doesn't work with reticulate, which means that you will need to reinstall the package each time you will modify the Python source code.

Tests

The tests are run in a GitHub action everytime a commit is pushed.

The tests are run with pytest, install it (within the virtual environment) with:

pip install pytest 

And run the tests:

pytest tests/tests.py

Build

The building and publishing are managed by a GitHub action for each new release. Carefully check the version number in pyproject.toml before creating the new tag and the new release.

Warning: once published on PyPi, a version can't be re-uploaded.

Local build

In the development virtual environment, install:

pip install build twine

Then build the package:

python3 -m build

Configure your pypi access token and publish the package version:

twine upload dist/rsdb_utils-0.1.tar.gz dist/rsdb_utils-0.1-py3-none-any.whl

How to update the JSON schema

To pull the last schema in the repository, use the following command:

git submodule update --recursive --remote

You can then create a new version of rsdb-utils: edit the version number in pyproject.toml, commit, push and publish a new release of rsdb-utils.

Romain THOMAS 2024
Stockholm Resilience Centre