Convert monolithic Jupyter notebooks into maintainable pipelines.


Keywords
data-engineering, data-science, jupyter, jupyter-notebooks, machine-learning, mlops, workflow
License
Apache-2.0
Install
pip install soorgeon==0.0.15

Documentation

Soorgeon

Tip

Deploy AI apps for free on Ploomber Cloud!

Join our community | Newsletter | Contact us | Blog | Website | YouTube

header

Convert monolithic Jupyter notebooks into Ploomber pipelines.

soorgeon.mp4

3-minute video tutorial.

Note: Soorgeon is in alpha, help us make it better.

Install

Compatible with Python 3.7 and higher.

pip install soorgeon

Usage

[Optional] Testing if the notebook runs

Before refactoring, you can optionally test if the original notebook or script runs without exceptions:

# works with ipynb files
soorgeon test path/to/notebook.ipynb

# and notebooks in percent format
soorgeon test path/to/notebook.py

Optionally, set the path to the output notebook:

soorgeon test path/to/notebook.ipynb path/to/output.ipynb

soorgeon test path/to/notebook.py path/to/output.ipynb

Refactoring

To refactor your notebook:

# refactor notebook
soorgeon refactor nb.ipynb

# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet

# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory

# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py

# use alternative serializer (cloudpickle or dill) if notebook 
# contains variables that cannot be serialized using pickle 
soorgeon refactor nb.ipynb --serializer cloudpickle
soorgeon refactor nb.ipynb --serializer dill

To learn more, check out our guide.

Cleaning

Soorgeon has a clean command that applies black for .ipynb and .py files:

soorgeon clean path/to/notebook.ipynb

or

soorgeon clean path/to/script.py

Linting

Soorgeon has a lint command that can apply [flake8]:

soorgeon lint path/to/notebook.ipynb

or

soorgeon lint path/to/script.py

Examples

git clone https://github.com/ploomber/soorgeon

Exploratory data analysis notebook:

cd soorgeon/examples/exploratory
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

Machine learning notebook:

cd soorgeon/examples/machine-learning
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

To learn more, check out our guide.

Community

About Ploomber

Ploomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.

Whatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!

Click here to know how you can contribute to Ploomber.