Kelner
Ridiculously simple model serving.
- Get an exported model (download or train and save)
kelnerd -m SAVED_MODEL_FILE
- There is no step 3, your model is served
Quickstart
kelner
Install $ pip install kelner
Download a Tensorflow ProtoBuff file
$ wget https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip
$ unzip inception_dec_2015.zip
Archive: inception_dec_2015.zip
inflating: imagenet_comp_graph_label_strings.txt
inflating: LICENSE
inflating: tensorflow_inception_graph.pb
$ kelnerd -m tensorflow_inception_graph.pb --engine tensorflow --input-node ExpandDims --output-node softmax
Run the server
$ kelnerd -m tensorflow_inception_graph.pb --engine tensorflow --input-node ExpandDims --output-node softmax
Send a request to the model:
$ curl --data-binary "@dog.jpg" localhost:61453 -X POST -H "Content-Type: image/jpeg"
The response should be a JSON-encoded array of floating point numbers.
For a fancy client (not really necessary, but useful) you can use the kelner
command.
This is how you get the top 5 labels from the server you ran above (note the head -n 5
part):
$ kelner classify dog.jpg --imagenet-labels --top 5
boxer: 0.973630
Saint Bernard: 0.001821
bull mastiff: 0.000624
Boston bull: 0.000486
Greater Swiss Mountain dog: 0.000377
kelner
in code
Use If you need to, you can also use kelner
in your code.
Let's create an example model:
import keras
l1 = keras.layers.Input((2,))
l2 = keras.layers.Dense(3)(l1)
l3 = keras.layers.Dense(1)(l2)
model = keras.models.Model(inputs=l1, outputs=l3)
model.save("saved_model.h5")
Now load the model in kelner
:
import kelner
loaded_model = kelner.model.load("saved_model.h5") # keras engine is the default
kelner.serve(loaded_model, port=8080)
FAQ
Who is this for?
Machine learning researchers who don't want to deal with building a web server for every model they export.
Kelner loads a saved Keras or Tensorflow model and starts an HTTP server that pipes POST request body to the model and returns JSON-encoded model response.
How is it different from Tensorflow Serving?
- Kelner is ridiculously simple to install and run
- Kelner also works with saved Keras models
- Kelner works with one model per installation
- Kelner doesn't do model versioning
- Kelner is JSON over HTTP while tf-serving is ProtoBuf over gRPC
- Kelner's protocol is:
-
GET
returns model input and output specs as JSON -
POST
expects JSON or an image file, returns JSON-encoded result of model inference
-