Build a Q&A App with PyTorch

Original Source Here

Build a Q&A App with PyTorch

How to easily deploy a QA HuggingFace model using Docker and FastAPI

Photo by pixabay from Pexels.

Table of Contents

  1. Introduction
  2. Define the search context dataset
  3. Build the QA Embedding model
  4. Deploy the model using Docker and FastAPI
  5. Conclusion
  6. References


In the last few years a breadth of pre-trained models have been made available from computer vision to natural language processing, with some of the most well known aggregators being Model Zoo, Tensorflow Hub and HuggingFace.

The availability of such a large set of pre-trained models, allows developers to reuse these models without spending large amounts of time and money training them. For instance, training a GPT-3 model would cost over $4.6M using a Tesla V100 instance¹.

In this post, we’ll cover:

  1. How to create a Question Answering (QA) model, using a pre-trained PyTorch model available at HuggingFace;
  2. How to deploy our custom model using Docker and FastAPI.

Define the search context dataset

There are two main types of QA models. The first one encodes a large corpus of domain specific knowledge into the model and generates an answer based on the learned knowledge. The second one makes use of a given context and extracts the best paragraph / answer from that context.

The second approach is more easily generalisable to different domains without retraining or fine-tunning the original model. As such, in this post we will focus on this approach.

To use a context based QA model we first need to define our ‘context’. Here, we will use the Stanford Question Answering Dataset 2.0². To download this dataset click here³.

After downloading this file, open it in Python and check its structure.

The observed structure should be similar to the one provided below. From this data we will focus on the question and answers fields where the topic is Premier League. This will provide us with exact answers to a specific number of questions. If you instead want to extract an answer from a context paragraph look at the context field.

'data': [
'topic1': {
'title': str,
'paragraphs': [
'qas': [
'id': str,
'is_impossible': bool,
'question': str,
'answers': [
'text': str,
'answer_start': int
'context': str

To obtain the questionsand answers, define and run the following function get_qa . This should return a set of 357 pairs of questions and answers.

Build the QA Embedding model

In simple terms our model will work by comparing a new question from our user to the set of questions in our context set, and then extracting the corresponding answer.

Since we cannot compare the questions in their raw format (text), we will need to transform both the context questions and the unknown questions from the user into a common space, prior to perform any similarity evaluation.

To do this, we will define a new text embedding class that will be used to convert the context, and unknown questions from the user, from text to a numeric vector.

1. Download a pretrained embedding model

As noted in the ‘introduction’ section, training a model from scratch is time consuming and expensive. So instead, let’s use an already trained model available at HuggingFace.

Save the following script to a file called download_model.shand run it in the terminal with bash to download the required model files.

2. Test the embedding model locally

If you don’t have the transformers package, start by installing it with pip.

pip install transformers[torch]

Then, in a new notebook cell define and run the following get_model function. If all files were downloaded properly and all dependencies met, this should run without issues.

Let’s now run our embedding model over a sample of the context questions. To do this, run the following instructions.

The above script should print the shape of our new embeddings vector.

Embeddings shape: torch.Size([3, 384]

3. Test the similarity of the context to a new question

Let’s start by checking our previous sample questions:

'How many club members are there?',
'How many matches does each team play?',
'What days are most games played?'

Then, paraphrase the last one to:

'Which days have the most events played at?'

Finally, let’s embed our new question and calculate the Euclidean distance between new_embedding and embeddings.

The above script should output the following distances, indicating that the last question in our sample is indeed the closest (smallest distance) to our new question.

tensor([71.4029, 59.8726, 23.9430])

Deploy the model using Docker and FastAPI

The previous section introduced all the building blocks to define our QA search model. To make this usable in a production setting, we need to:

  1. Wrap the previous functions in one or more easy to use classes;
  2. Define an app and call the required class methods through HTTP;
  3. Wrap the whole app and dependencies in a container for easy scalability.

1. Define the QA Search Model

Let’s wrap the previously introduced concepts into two new classes: QAEmbedder and QASearcher.

The QAEmbedder will define how to load the model (get_model) from disk and return a set of embeddings given a set of questions (get_embeddings). Note that for efficiency get_embeddings will embed a batch of questions at a time.

The QASearcher will set the context of corresponding questions and answers (set_context_qa), and return the answer to the most similar question in our context to the new unseen question from the user (get_answers).

2. Define the FastAPI app

Our app should contain 2 POST endpoints, one to set the context (set_context) and one to get the answer to a given unseen question (get_answer).

The set_context endpoint will receive a dictionary containing 2 fields (questions and answers) and update the QASearcher.

The get_answer endpoint will receive a dictionary with 1 field (questions) and return a dictionary with the original question (orig_q), the most similar question in the context (best_q) and the associated answer (best_a).

3. Build the Docker container

The last step is to wrap our app in a Docker container to more easily distribute and scale. In our Dockerfile we need to:

  1. Install wget and required Python libraries transformers, uvicorn and fastapi;
  2. Download the pre-trained QA model from HuggingFace;
  3. Copy all app files (available here) to the Docker image and run the uvicorn app.

To test our new app, build and run the Docker image with:

docker build . -t qamodel &&\
docker run -p 8000:8000 qamodel

If all goes well you should be greeted with the following messages:

INFO:     Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on (Press CTRL+C to quit)

4. Test the running app

To test our app we can simply use requests to set the context and then to retrieve the best answer to a given new unseen question.

To test the set_context endpoint run the following script:

This should return the following message:

{'message': 'Search context set'}

To test the get_answer endpoint run the following script:

This should return the following messages:

orig_q : How many teams compete in the Premier League ?
best_q : How many clubs are currently in the Premier League?
best_a : 20

orig_q : When does the Premier League starts and finishes ?
best_q : When does the Premier League have its playing season?
best_a : During the course of a season (from August to May)

orig_q : Who has the highest number of goals in the Premier League ?
best_q : Who has the record for most goals in the Premier League?
best_a : Newcastle United striker Alan Shearer holds the record for most Premier League goals with 260

Complete scripts

For the full scripts go to my GitHub page by following this link:


Building powerful computer vision and natural language processing models are becoming increasingly easier by exploiting freely available pre-trained models.

In this article, I introduced the basic building blocks to build your own Question and Answering app using HuggingFace, Docker and FastAPI. Note that this sequence of steps is not particular to Q&A, but can indeed be used for most computer vision and natural language processing solutions.

If you are interested in deploying this app in a serverless fashion have a look at my previous article “Build a serverless API with Amazon Lambda and API Gateway”⁴.

If you are just starting to learn about PyTorch and want a quick introduction to it maybe this article is of interest “Getting started with PyTorch”⁵.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: