Original Source Here
Step-by-step guide on how to build a neural search service.
Information retrieval technology is one of the main technologies that enabled the modern Internet to exist. These days, search technology is the heart of a variety of applications. From web-pages search to product recommendations. For many years, this technology didn’t get much change until neural networks came into play.
In this tutorial we are going to find answers to these questions:
- What is the difference between regular and neural search?
- What neural networks could be used for search?
- In what tasks is neural network search useful?
- How to build and deploy own neural search service step-by-step?
What is neural search?
A regular full-text search, such as Google’s, consists of searching for keywords inside a document. For this reason, the algorithm can not take into account the real meaning of the query and documents. Many documents that might be of interest to the user are not found because they use different wording.
Neural search tries to solve exactly this problem — it attempts to enable searches not by keywords but by meaning. To achieve this, the search works in 2 steps. In the first step, a specially trained neural network encoder converts the query and the searched objects into a vector representation called embeddings. The encoder must be trained so that similar objects, such as texts with the same meaning or alike pictures get a close vector representation.
Having this vector representation, it is easy to understand what the second step should be. To find documents similar to the query you now just need to find the nearest vectors. The most convenient way to determine the distance between two vectors is to calculate the cosine distance. The usual Euclidean distance can also be used, but it is not so efficient due to the curse of dimensionality.
Which model could be used?
It is ideal to use a model specially trained to determine the closeness of meanings. For example, models trained on Semantic Textual Similarity (STS) datasets. Current state-of-the-art models could be found on this leaderboard.
However, not only specially trained models can be used. If the model is trained on a large enough dataset, its internal features can work as embeddings too. So, for instance, you can take any pre-trained on ImageNet model and cut off the last layer from it. In the penultimate layer of the neural network, as a rule, the highest-level features are formed, which, however, do not correspond to specific classes. The output of this layer can be used as an embedding.
What tasks is neural search good for?
Neural search has the greatest advantage in areas where the query cannot be formulated precisely. Querying a table in a SQL database is not the best place for neural search.
On the contrary, if the query itself is fuzzy, or it cannot be formulated as a set of conditions — neural search can help you. If the search query is a picture, sound file or long text, neural network search is almost the only option.
If you want to build a recommendation system, the neural approach can also be useful. The user’s actions can be encoded in vector space in the same way as a picture or text. And having those vectors, it is possible to find semantically similar users and determine the next probable user actions.
Let’s build our own
With all that said, let’s make our neural network search. As an example, I decided to make a search for startups by their description. In this demo, we will see the cases when text search works better and the cases when neural network search works better.
Prepare data for neural search
To be able to search for our descriptions in vector space, we must get vectors first. We need to encode the descriptions into a vector representation. As the descriptions are textual data, we can use a pre-trained language model. As mentioned above, for the task of text search there is a whole set of pre-trained models specifically tuned for semantic similarity.
One of the easiest libraries to work with pre-trained language models, in my opinion, is the sentence-transformers by UKPLab. It provides a way to conveniently download and use many pre-trained models, mostly based on transformer architecture. Transformers is not the only architecture suitable for neural search, but for our task, it is quite enough.
We will use a model called `distilbert-base-nli-stsb-mean-tokens`. DistilBERT means that the size of this model has been reduced by a special technique compared to the original BERT. This is important for the speed of our service and its demand for resources. The word `stsb` in the name means that the model was trained for the Semantic Textual Similarity task.
The complete code for data preparation with detailed comments can be found and run in Colab Notebook.
Vector search engine
Now as we have a vector representation for all our records, we need to store them somewhere. In addition to storing, we may also need to add or delete a vector, save additional information with the vector. And most importantly, we need a way to search for the nearest vectors.
The vector search engine can take care of all these tasks. It provides a convenient API for searching and managing vectors.
In our tutorial we will use Qdrant vector search engine. It not only supports all necessary operations with vectors but also allows to store additional payload along with vectors and use it to perform filtering of the search result. Qdrant has a client for python and also defines the API schema if you need to use it from other languages.
The easiest way to use Qdrant is to run a pre-built image. So make sure you have Docker installed on your system.
To start Qdrant, use the instructions on its homepage.
Download image from DockerHub:
docker pull generall/qdrant
And run the service inside the docker:
docker run -p 6333:6333 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
You should see output like this
[...] Starting 12 workers
[...] Starting "actix-web-service-0.0.0.0:6333" service on 0.0.0.0:6333
This means that the service is successfully launched and listening port 6333. To make sure you can test http://localhost:6333/ in your browser and get qdrant version info.
All uploaded to Qdrant data is saved into the `./qdrant_storage` directory and will be persisted even if you recreate the container.
Upload data to Qdrant
Now once we have the vectors prepared and the search engine running, we can start uploading the data. To interact with Qdrant from python, I recommend using an out-of-the-box client library.
To install it, use the following command
pip install qdrant-client
At this point, we should have startup records in file `startups.json`, encoded vectors in file `startup_vectors.npy`, and running Qdrant on a local machine. Let’s write a script to upload all startup data and vectors into the search engine.
First, let’s create a client object for Qdrant.
# Import client library
from qdrant_client import QdrantClientqdrant_client = QdrantClient(host=’localhost’, port=6333)
Qdrant allows you to combine vectors of the same purpose into collections. Many independent vector collections can exist on one service at the same time.
Let’s create a new collection for our startup vectors.
The `recreate_collection` function first tries to remove an existing collection with the same name. This is useful if you are experimenting and running the script several times.
The `vector_size` parameter is very important. It tells the service the size of the vectors in that collection. All vectors in a collection must have the same size, otherwise, it is impossible to calculate the distance between them. `768` is the output dimensionality of the encoder we are using.
The Qdrant client library defines a special function that allows you to load datasets into the service. However, since there may be too much data to fit a single computer memory, the function takes an iterator over the data as input.
Let’s create an iterator over the startup data and vectors.
Now we have vectors, uploaded to the vector search engine. On the next step we will learn how to actually search for closest vectors.
The full code for this step could be found here.
Make a search API
Now that all the preparations are complete, let’s start building a neural search class.
First, install all the requirements:
pip install sentence-transformers numpy
In order to process incoming requests neural search will need 2 things. A model to convert the query into a vector and Qdrant client, to perform a search queries.
With Qdrant it is also feasible to add some conditions to the search. For example, if we wanted to search for startups in a certain city, the search query could look like this:
We now have a class for making neural search queries. Let’s wrap it up into a service.
Deploy as a service
To build the service we will use the FastAPI framework. It is super easy to use and requires minimal code writing.
To install it, use the command
pip install fastapi uvicorn
Our service will have only one API endpoint and will look like this:
Now, if you run the service with
and open your browser at http://localhost:8000/docs , you should be able to see a debug interface for your service.
Feel free to play around with it, make queries and check out the results.
This concludes the tutorial.
The described code is the core of this online demo.
You can try it to get an intuition for cases when the neural search is useful. The demo contains a switch that selects between neural and full-text searches. You can turn neural search on and off to compare the result with regular full-text search. Try to use startup description to find similar ones.
In this tutorial, I have tried to give minimal information about neural search, but enough to start using it. Many potential applications are not mentioned here, this is a space to go further into the subject.
Subscribe to my telegram channel, where I talk about neural networks engineering, publish other examples of neural networks and neural search applications.
Subscribe to the Qdrant user’s group if you want to be updated on latest Qdrant news and features.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot