Product Quantization for Similarity Search*p5_lEAB_bMtMgPNt

Original Source Here

Product Quantization for Similarity Search

How to compress and fit a humongous set of vectors in memory for similarity search with asymmetric distance computation (ADC)

Photo by Markus Winkler on Unsplash

Similarity search and nearest neighbor search are very popular and widely used in many fields. They are used in recommendation systems, in online stores and marketplaces that enable product image searches, or in systems specialized in document, media, or object matching and retrieval.

Similarity search is often done over a large collection of object embeddings, usually in the form of high-dimensional vectors.

Fitting a humongous set of high-dimensional vectors in memory to perform similarity search is a challenge, and product quantization can help to overcome this with some tradeoffs.

What is Product Quantization

Product quantization (PQ) is a technique used for vector compression. It is very effective in compressing high dimensional vectors for nearest neighbor search. According to the authors of Product Quantization for Nearest Neighbor Search [1],

“The idea is to decompose the space into a Cartesian product of low dimensional subspaces and to quantize each subspace separately.

A vector is represented by a short code composed of its subspace quantization indices. The Euclidean distance between two vectors can be efficiently estimated from their codes. An asymmetric version increases precision, as it computes the approximate distance between a vector and a code”.

Note that product quantization is not dimensionality reduction.

In product quantization, which is a form of vector quantization, the number of vectors will remain the same after quantization. However, the values in the compressed vector are now transformed into short codes, and therefore they are symbolic and are no longer numeric. With this representation, the size of each vector is reduced dramatically.

Product quantization is also one of the many index types implemented in Faiss (Facebook AI Similarity Search), a library that is highly optimized for efficient similarity search.

How Product Quantization Works

Let’s say we have a collection of vectors in the database, and the dimension (or length) of each vector is 128. This means the size of one vector is 128 x 32 bits = 4096 bits (which is equivalent to 512 bytes).

All images are by the author unless otherwise specified

First, we divide and split the vector into segments. The diagram below shows that the vector is divided and split into 8 segments, where the length of each segment is 16.

Dividing and splitting a vector into segments

We do this for every vector, and hence what we’re doing is essentially separating the vectors into 8 different sub-spaces. If we have 1000 vectors in the database, then each sub-space would contain 1000 segments of length 16.


Next, we proceed to train our vectors by running k-means clustering on each of the sub-space. Based on the value of k that we choose, the k-means clustering would generate k centroids (i.e. cluster centers) within that sub-space, and these centroids have the same length as the segment.

The centroids are also known as reproduction values. The set of centroids is called the codebook, and we would talk more about it later.

As an example, if we choose k=256, we would end up having a total of 256 x 8 = 2048 centroids.

Running k-means clustering on each sub-space

The centroids are also known as reproduction values because they can be used to approximate the reconstruction of the vectors by concatenating the respective centroids from each segment. However, the reconstructed vectors would not be exactly the same as the original vectors, as product quantization is a lossy compression.

For training, a different set or a subset of vectors could also be used, as long as they have the same distribution as the database vectors.


After training is completed, for each segment of vector in the database, we find the nearest centroid from the respective sub-space. In other words, for each segment of the vector, we only need to find the nearest centroid out of the pool of 256 centroids belonging to the same sub-space.

For each segment, after obtaining the nearest centroid, we substitute it with that centroid’s Id. The centroid Ids are nothing but indices of the centroids (a number from 0 to 255) within the sub-space.

And there we have them, they are the compressed representation of the vectors. These are the short codes that we talked about earlier, and let’s refer to them as PQ codes.

In this example, one PQ code consists of 8 centroid Ids across the segments. We have 1000 vectors in the database, so that would transform into 1000 PQ codes.

PQ Codes, the compressed representation of the vectors

Basically what we’ve done is encode our original vectors with centroid Ids, where each segment is encoded with 8 bits.

There are 8 segments per vector, thus every encoded vector takes up only 8 x 8 bits = 8 bytes of space with this representation. Compared to storing the original vector of 512 bytes, that is a huge saving of space.

As demonstrated in this example, we got a whopping 64 times reduction in memory usage (from 512 bytes to 8 bytes per vector), and that’s a significant amount when we’re dealing with hundreds of thousands of records, if not millions!

Memory usage reduction after product quantization
Photo by krakenimages on Unsplash

The value of k is usually a power of two.

For M segments, the memory requirement for one PQ code is M*(log base 2 of k) bits.

Searching with Quantization

So we are effectively using 8 centroid Ids to represent a vector. But how exactly does this representation work for similarity search?

The answer lies in the codebook, which contains centroids, the reproduction values. The process is explained below.

Given a query vector q, our goal is to find vectors that are very similar to q from the collection of vectors that we have in the database.

A typical process would be to calculate and compare the distances between the query vector and all the vectors in the database and return the top N records with the shortest distance.

However, we’re going to do something different. We’re not going to use the original vectors at all for the distance calculation. Instead, we will do asymmetric distance computation (ADC) and estimate the distances using the vector-to-centroid distances.

We begin by dividing and splitting the query vector into the same number of segments.

For each query vector segment, we pre-calculate the partial squared Euclidean distance with all the centroids of the same segment from the codebook.

These partial squared Euclidean distances are recorded in a distance table d. In our example, as shown below, the distance table consists of 256 rows and 8 columns.

Pre-compute the Distance Table

Now that we have the distance table, we can obtain the distance for each row of PQ code easily just by looking up the partial distances and doing a summation.

Look up partial distances from the Distance Table

After obtaining the distances for all rows of PQ codes, we sort them in ascending order and from the top results (i.e. records with the shortest distance), find and return the corresponding vectors from the database.

The following diagram summarizes the product quantization process for similarity search.

Product quantization process for similarity search

It is worth noting that with product quantization, the search is still a brute force search, as the distance lookup and summation are exhaustive and need to be done for all rows of PQ codes.

Also, since we’re comparing vector-to-centroid distances, the distances are not exact vector-to-vector distances. They are just estimated distances, and thus the results may be less precise and may not always be the true nearest neighbors.

The search quality can be improved by tuning the number of centroids or the number of segments. More centroids or segments result in higher accuracy and precision, but they would also slow down the search operation as well as the time required for training and encoding. Other than that, more centroids may potentially lead to an increase in the number of bits required to represent a code, and hence lesser memory savings.

Here’s a glimpse of simple Python implementation of product quantization, inspired and adapted from nanopq.


Product quantization divides and splits vectors into segments, and quantizes each segment of the vectors separately.

Each vector in the database is converted to a short code (PQ code), a representation that is extremely memory-efficient for the approximate nearest neighbor search.

Similarity search with product quantization is highly scalable, but we trade some precision for memory space.

At the expense of a less precise search, product quantization enables a large-scale search that would otherwise not be possible.

As product quantization alone is not the most effective method for very large-scale search, we will see how we can implement a faster search method that is non-exhaustive in the next post. Stay tuned!


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: