1. Sign In

Serving your TensorFlow model as a simple web app with docker

If you've configured your environment and run our basic Universal Sentence Encoder model then you are ready to build a small app around it.

Essentially we are going to build a web-enabled version of the REPL we created during the validation phase. Our users will send a short string to the model, the model will classify the input, we will display the results, and the process can start again.

Perl5 book cover


There's More Than One Way To Do It. I was a young programmer when I stole a Perl5 book from the local library containing this incantation. I've sinced lost that book, donated other books to multiple libraries, and never lost the wisdom contained in this motto. I think what Mr Wall was trying to get across was yes - Perl will let you shoot yourself in the foot (and how!) - but also we aren't all going to solve the same problem the same way and that's okay!

Don't worry - we won't be using Perl! We are going to build a Flask application!

Serve it Locally

You'll want to do a pip3 install flask to get Flask into your environment. This is the Python module providing web serving. Here's our serving code:

import numpy as np
import tensorflow as tf
import tensorflow_text
import tensorflow_hub as hub
from flask import Flask, request, jsonify

embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/3")
model = tf.keras.models.load_model('data/output/example')

label2index = { 'other': 0, 'aerospace': 1, 'animals': 2, 'body_parts': 3 }
labels = list(label2index.keys())

app = Flask(__name__)

def index():
    return app.send_static_file('index.html')

def classify():
    sentence = request.args.get("sentence")
    if not sentence:
        return 'nothing to do', 400

    predictions = model.predict(embed([ sentence ]))
    distribution = predictions[0]

    dist_dict = {}
    for label, prob in zip(labels, distribution):
        dist_dict[label] = float(prob)

    return jsonify({ 'label': labels[np.argmax(distribution)], 'distribution': dist_dict })

There should be several parts of this that are very familiar. The imports we used in example-val.py, the way we load the model from a saved state, label definition, and some of how we get model results. What may not be familiar is how Flask gives us the @app.route decorator to specify how our calls work.

Save the code above in a file called serve.py. You may have noticed the app.send_static_file call. It is simply returning our index.html application which you can download here.

Run this from the terminal then:

❯ env FLASK_APP=serve flask run
* Serving Flask app "serve"
* Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
* Debug mode: off
2020-09-27 08:58:12.561405: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fbba40e6510 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-27 08:58:12.561423: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
INFO:werkzeug: * Running on (Press CTRL+C to quit)

The env command is a shortcut to setting an environment variable before invoking the command. So here we're setting FLASK_APP to the value serve so it can invoke our serve.py file.

Now we can verify it's running:

❯ curl http://localhost:5000/api/classify\?sentence\=I%20like%20goats


Running the app as a docker container is the best way to ensure repeatable results. Once your container is right where you want it you can be certain it won't change when deployed on staging or production environments. It's a big ask that is quite simple to work with. Here's the Dockerfile for our app:

FROM tensorflow/tensorflow:2.3.0

RUN apt-get update -y && pip3 install flask && pip3 install tensorflow_text && pip3 install tensorflow_hub

COPY ./data /app/data
COPY ./static /app/static
COPY serve.py /app/

ENV FLASK_ENV=production

ENTRYPOINT [ "flask", "run", "--host", "" ]

Essentially this is setting up our dependencies, copying our relevant files into the container, and finally specifying what to do on startup (flask run).

Note that you must save the above docker commands in a file called Dockerfile inside the same director with our other files. First we must build the image: docker build -t tfserve . tells docker to build the current directory and tag the image 'tfserve'.

You can run the container using docker run -p 8000:5000 -it tfserve which will map our local port 8000 to the container port 5000. It will take time for this image to start because it is downloading our TensorFlow Hub module each time it starts, which is not ideal. Once it starts you should be able to visit it at http://localhost:8000/.


That's it! If you followed parts 1 & 2 you've built a model, set up your environment, trained the model, wrote a web app around it, and served it from a docker container! The details will change but the process will essentially always be this: environment -> model -> train -> app -> production with lots of heartache in between.

Download the full set of code for the article.