productionize

Lightweight ML Deployment Platform


Keywords
deployment, kubernetes, ml
License
MIT
Install
pip install productionize==0.1.0

Documentation

productionize - deploy ML models directly from Python [WIP]

Version License Status macOS

productionize is an open-source lightweight ML deployment tool.
You can containerize, deploy and ship your model, without ever
having to leave your beloved Python.


productionize in a nutshell

What does it do? Well, it does exactly what the catchy library name says it does. productionize helps you to productionize your API. As a Data Scientist, most of the projects I worked on face issue when productionizing code. Often, the code is not tested, standardized or environment agnostic enough to just deploy something somewhere. This where containers come in very handy. Containerization helps you to freeze the environment and decouple your model or just your code from the host system. This makes deployment much, much easier.

However, working with Docker, Kubernetes and all these other fancy tools is not as simple as one might hope. The good news though, some steps can be automated and this is exactly what productionize does. As Data Scientist you can focus on your model and the containerization and deployment is handled by productionize.

The workflow with productionize is very simple. First, you develop your API in Python. Next, productionize allows you easily setup a local Kubernetes cluster that allows you to test your API. In productionize, this local Kubernetes cluster is called a workbench, because it is Kubernetes, with a little extra stuff to help you work. Next, you deploy your API. You don't have to change your standard API script for that, productionize will handle that for you. Within a matter of seconds, your API is built into a container and deployed to your workbench. Here you can test your API and see if it works. If you are happy with it, you can simply export the container and deploy it to any Kubernetes cluster you like.

That way, productionize makes it super easy to turn your local API into a production-ready container and the best part: you don't even have to leave Python.

Installation

productionize is a Python library, which is hosted on PyPi. Currently, the functions are only supported on macOS. On the darwin platform you can therefore download the package using pip.

pip install productionize

The library relies on two components, which every MacBook should already have, however in case you don't, you would need to install xcode and homebrew. xcode is a developer tool kit released by Apple for all their macOS based products. You can install the relevant pieces by running the following command in your terminal:

xcode-select install

homebrew on the other hand is a package manager, that allows you to easily install and manage applications. If not for productionize, I would anyway recommend to use it. You can read more about it here. To install homebrew just run the following commands in your terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"

Usage

Once the library is properly installed from PyPi, you can source it using your standard python import command. The core of the library are it's three main classes, those can be imported as follows:

# import lib
from productionize import workbench, product

The library contains two major classes. The first one is the workbench() class. This class allows you to setup and manage a proper ML workbench on your local machine. The second one is the product() class. This class allows you to deploy your ML APIs to the local workbench and to any Kubernetes cluster.

The workbench() class

Once the main classes are sourced, you can setup your very own workbench on your local machine. The workbench consists of several tools:

  • Docker: a container technology, which helps us to build Docker container, which are the quasi-standard in Machine Learning deployment. You can read more about Docker here.
  • VirtualBox: a driver that is needed to create a VM on you local machine to host the Kubernetes cluster, which is at the heart of the workbench. You can read more about VirtualBox here
  • Kubectl: a cli which allows you to interact with Kubernetes. You won't have to do that, but productionize is running Kubernetes commands in the background.
  • Minikube: a local implemenation of Kubernetes. Minikube runs on a VM, which is administrated by Virtualbox.

Technically, the components are ensembled in a simple fashion. However, the only specialty is, that Minikube is installed on top of VirtualBox.

To setup the workbench, these tools need to be installed. You can do this, by simple running the setup() method of the workbench class. Once initiated you can call the method.

# initiate class
cluster = workbench()

# install and setup components
cluster.setup()

To fire up the entire workbench, you first need to login to Docker Desktop. This is installed for you, however, you need to have it running. You can easily do this, just search on your computer - if you have a Mac you just use spotlight search - for Docker and start the application.

Next you will have to sign in. If you don't have an account already, you can create one for free at Docker Hub. Which is a lot like GitHub, just for containers.

Once you did this, you are good to go on. You can now start the cluster using the start_cluster() method. This method allows you to set the resource quota for the cluster. Default are two CPUs and 2GB of memory.

# start the cluster
cluster.start_cluster(cpus = '2', memory = '2G')

When the cluster is running, you can create a project. This helps to have a clean and well-structured cluster running. You can do this with the open_project() method.

# open project
cluster.open_project(name = "my-project")

In case you want to delete the project you can use the delete_project() method. Technically, the projects are namespaces on Kubernetes.

# delete project
cluster.delete_project(name = "my-project")

To stop the cluster you can simply use the stop_cluster() method. This one just idles the cluster, but doesn't remove all the components.

# stop the cluster
cluster.stop_cluster()

To cleanly uninstall all the components, you can just run the uninstall() method and even specify which components to delete. The default is, that the components that existed on your machine before will be not removed.

# cleanly uninstall cluster components
cluster.uninstall(docker = None, kubectl = None, virtualbox = None, minikube = None, report = True)

The product() class

While the workbench class mainly concerns the infrastructure management, the product class deals with your API. The product class turns your API into a deployable product. Once you have an API programmed, for instance with Flask, the product class will do the rest for you.

Let's consider the following python script containing a Flask API:

#!flask/bin/python
from flask import Flask

app = Flask(__name__)

@app.route('/hello')
def index():
    return "Hello, World!"

if __name__ == '__main__':
    app.run(port = '8000', host = '0.0.0.0')

You can, of course, create any kind of API you like. You can also add new routes or whatever you need. To deploy an API to Kubernetes, you would typically need to containerize the API. productionize does that for you. The product class contains the prepare_deployment() method. This method produces a Dockerfile from your API script and a requirements file.

# initiate the class and say which project the product belongs to
my_api = product(name = "my-product",
                 project = "my-project")

# prepare the deployment
my_api.prepare_deployment(api_file = "path_to/api.py",  # path to the api file
                          requirements_file = "path_to/requirements.txt", # path to the req file
                          port = "8000") # the port your API is exposed to

Note: I would advise to not do any directory stunts here. The code in this library is flexible, however, it might be a bit tricky.

Once you run the prepare_deployment() method, productionize will build a Dockerfile in your current working directory.

You can, of course, modify and edit the Dockerfile. However, at your own risk. If you intend to work in an enterprise context it might be necessary to change permissions within the container. This does not have an effect on productionize. Per default, productionize containers run with root.

FROM python:3.7.7
RUN mkdir -p /api
COPY api.py /api/api.py
COPY requirements.txt /api/requirements.txt
RUN python -m pip install -r /api/requirements.txt
EXPOSE 8000
ENTRYPOINT ["python", "api/api.py"] 

Once you ran the prepare_deployment() method, you can deploy your api to the workbench. Why would you do this? Well, the workbench should serve as your local test environment. Using the deploy() method, you can easily deploy your "product" to the workbench.

my_api.deploy()

Per default deploy() does not take any arguments. Those are not necessary as all info is stored in the my_api object after prepare_deployment. However, if you want, you can also deploy your product on your localhost. Technically speaking, this will just create Docker container that runs on localhost. This can be acheived with the local arg in the method call.

my_api.deploy(local = True)

Once your product is deployed, the method will return the url under which you can reach your API. However, don't forget to add your custom routes.

Your output should look somewhat like this:

>>> my_api.deploy()

    Deployment Report:
    ------------------

    This is an automatically generated report on the status of your deployed
    product. Your API is now containerized and hosted on the workbench. You
    can access the API using:

    http://XXX.XXX.XX.XXX:XXXXX/<your_route>

    You can call the API in whatever way it is designed. If you want to get
    rid of it, just use the delete_deployment() method. If you just want to
    update the API, you can just use prepare_deployment() to create a new
    Dockerfile and then deploy() again.

              Your Product
    -----------------------
    Name:       my-product
    Project:    my-project
    Status:     deployed and healthy
    Access:     http://XXX.XXX.XX.XXX:XXXXX/<your_route>

    If you want to export the image to your local machine just use the
    export_product() method. If you want to push it to another registry,
    you can use the push_product() method.

Now you know how to reach your API. In case you find out it doesn't work and you change something on the code, you can just re-run prepare_deployment() and then deploy(). The deploy() will automatically realize that the "product" has already been deployed and will just update the existing one. In case you want to delete a product, you can just use the delete_deployment() method. This will also work for local deployments.

# delete product
my_api.delete_deployment(product = "my-product", project = "my-project")

When you are satisfied with your API, you might want to deploy or ship it to an enterprise-ready or collaborative cluster. As the workbench is at the heart a Kubernetes cluster, everything you do on the workbench, will work on any other cluster. To give you the freedom of choice, productionize implements a method to deploy anywhere.

This is the push_product() method. This method pushes the product in form of a Docker image to any registry you want. Default is DockerHub. However, you can select any registry you like. In case of secure registries, you will need credentials or a token. Those will be asked from you with a prompt.

# push the product
my_api.push_product(product = "my-product", registry = "my.registry:5000/image-name")

This method will automatically tag the image and run docker push to push the image to any remote industry.

Next Steps

productionize is far from ready and is still work in progress. I started this project around mid of May 2020, when I was super annoyed when I had to built up a new test cluster on my local machine, cause I messed up the others too much. As this all started with me sitting on my Mac, this project is at the moment only stable on macOS. I already started to work on other UNIX systems, however Windows might take a bit of time. So the next steps are the following:

Release 1.0

  1. Functional Features:
    1. Ease the export of products from workbench to local machine
    2. Integrate the push feature to external cluster registries
  2. Non-functional Features:
    1. Update unit testing for product() class

Release 2.0

  1. Functional Features:
    1. Add workbench management feature
  2. Non-functional Features:
    1. Support latest Ubuntu version
    2. Support latest CentOS version