pynb2docker

Library for turning Python Jupyter Notebooks into Docker images.


Keywords
docker, jupyter-notebooks, python3
License
Apache-2.0
Install
pip install pynb2docker==0.0.1

Documentation

pynb2docker

Library for turning Python Jupyter Notebooks into Docker images.

Installation (Linux)

  • Create a virtual environment:

    virtualenv -p /usr/bin/python3.7 venv
    
  • From source:

    git clone https://github.com/fracpete/pynb2docker.git
    cd pynb2docker
    ../venv/bin/pip install setup.py
    

Coding conventions

You can use the line magics in your notebook to define all the required libraries that your code depends on, e.g., if you need numpy, then add this in a code cell:

%pip install numpy

Of course, in order to take advantage of this non-standard line magic, you need to install the pip-magic library in your environment:

pip install pip_magic

Example

For this example we use the pandas_filter_pipeline.ipynb notebook and the additional pandas_filter_pipeline.dockerfile Docker instructions. This notebook contains a simple Pands filter setup, using a simple query to remove certain rows from the input CSV file and saving the cleaned dataset as a new CSV file.

The command-lines for this example assume this directory structure:

/some/where
|
+- venv   // virtual environment with pynb2docker installed
|
+- data
|  |
|  +- notebooks
|  |  |
|  |  +- pandas_filter_pipeline.ipynb       // actual notebook
|  |  |
|  |  +- pandas_filter_pipeline.dockerfile  // additional Dockerfile instructions
|  |
|  +- in
|  |  |
|  |  +- bolts.csv   // raw dataset to filter
|  |
|  +- out
|
+- output
|  |
|  +- pandascsvcleaner  // will contain all the generated data, including "Dockerfile"

For our Dockerfile, we use the python:3.7-buster base image (-b), which contains a Python3.7 installation on top of a Debian "buster" image. The pandas_filter_pipeline.ipynb notebook (-i) then gets turned into Python code using the following command-line:

./venv/bin/pynb2docker \
  -i /some/where/data/notebooks/pandas_filter_pipeline.ipynb \ 
  -o /some/where/output/pandascsvcleaner \
  -b python:3.7-buster \
  -I /some/where/data/notebooks/pandas_filter_pipeline.dockerfile  

Now we build the docker image called pandascsvcleaner from the Dockerfile that has been generated in the output directory /some/where/output/pandascsvcleaner (-o option in previous command-line):

cd /some/where/output/pandascsvcleaner
sudo docker build -t pandascsvcleaner .

With the image built, we can now push the raw CSV file through for cleaning. For this to work, we map the in/out directories from our directory structure into the Docker container (using the -v option) and we supply the input and output files via the INPUT and OUTPUT environment variables (using the -e option). In order to see a few more messages, we also turn on the debugging output that is part of the notebook, using the VERBOSE environment variable:

sudo docker run -ti \
  -v /some/where/data/in:/data/in \
  -v /some/where/data/out:/data/out \
  -e INPUT=/data/in/bolts.csv \
  -e OUTPUT=/data/out/bolts-clean.csv \
  -e VERBOSE=true \
  pandascsvcleaner

From the debugging messages you can see that the initial dataset with 40 rows of data gets reduced to 24 rows.

Disclaimer: This is just a simple notebook tailored to the UCI dataset bolts.csv.