Kelner

Ridiculously simple model serving.

Get an exported model (download or train and save)
kelnerd -m SAVED_MODEL_FILE
There is no step 3, your model is served

Quickstart

Install `kelner`

    $ pip install kelner

Download a Tensorflow ProtoBuff file

    $ wget https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip
    $ unzip inception_dec_2015.zip
        Archive:  inception_dec_2015.zip
        inflating: imagenet_comp_graph_label_strings.txt
        inflating: LICENSE
        inflating: tensorflow_inception_graph.pb
    $ kelnerd -m tensorflow_inception_graph.pb --engine tensorflow --input-node ExpandDims --output-node softmax

Run the server

    $ kelnerd -m tensorflow_inception_graph.pb --engine tensorflow --input-node ExpandDims --output-node softmax

Send a request to the model:

    $ curl --data-binary "@dog.jpg" localhost:61453 -X POST -H "Content-Type: image/jpeg"

The response should be a JSON-encoded array of floating point numbers.

For a fancy client (not really necessary, but useful) you can use the kelner command.

This is how you get the top 5 labels from the server you ran above (note the head -n 5 part):

    $ kelner classify dog.jpg --imagenet-labels --top 5
    boxer: 0.973630
    Saint Bernard: 0.001821
    bull mastiff: 0.000624
    Boston bull: 0.000486
    Greater Swiss Mountain dog: 0.000377

Use `kelner` in code

If you need to, you can also use kelner in your code.

Let's create an example model:

import keras

l1 = keras.layers.Input((2,))
l2 = keras.layers.Dense(3)(l1)
l3 = keras.layers.Dense(1)(l2)
model = keras.models.Model(inputs=l1, outputs=l3)
model.save("saved_model.h5")

Now load the model in kelner:

import kelner

loaded_model = kelner.model.load("saved_model.h5")  # keras engine is the default
kelner.serve(loaded_model, port=8080)

FAQ

Who is this for?

Machine learning researchers who don't want to deal with building a web server for every model they export.

Kelner loads a saved Keras or Tensorflow model and starts an HTTP server that pipes POST request body to the model and returns JSON-encoded model response.

How is it different from Tensorflow Serving?

Kelner is ridiculously simple to install and run
Kelner also works with saved Keras models
Kelner works with one model per installation
Kelner doesn't do model versioning
Kelner is JSON over HTTP while tf-serving is ProtoBuf over gRPC
Kelner's protocol is:
- GET returns model input and output specs as JSON
- POST expects JSON or an image file, returns JSON-encoded result of model inference

kelner
Release 0.1.5

Release 0.1.5

0.1.3

0.1.5

0.1.4

Documentation

Kelner

Quickstart

Install `kelner`

Download a Tensorflow ProtoBuff file

Run the server

Send a request to the model:

Use `kelner` in code

FAQ

Who is this for?

How is it different from Tensorflow Serving?

Stats

Development practices

Releases

Contributors

kelner Release 0.1.5

Release 0.1.5 Toggle Dropdown 0.1.3 0.1.5 0.1.4

Documentation

Kelner

Quickstart

Install kelner

Download a Tensorflow ProtoBuff file

Run the server

Send a request to the model:

Use kelner in code

FAQ

Who is this for?

How is it different from Tensorflow Serving?

Stats

Development practices

Releases

Contributors

kelner
Release 0.1.5

Release 0.1.5

0.1.3

0.1.5

0.1.4

Install `kelner`

Use `kelner` in code