HNSW·Memory·Index·Quantization·Vector

Estimating memory footprint of your HNSW index

December 22, 2023 · 10 min read

We are often asked about index parameter tuning and memory footprint considerations for building vector indexes. This interactive visualization will help you quickly reason about the resources necssary to host your embeddings and serve nearest neighbor queries over them.

This post assumes a high-level understanding of the HNSW algorithm.

Getting started with memory estimation

You should start by setting your appropriate dataset size and embedding dimensionality in the two sliders below.

The plot shows the memory footprint of an HNSW index as a function of M - one of 3 hyperparameters of the algorithm. Though the other two hyperparameters affect index construction time, query time, and recall, they do not affect index size.


Dataset Size (number of vectors): 913.16 thousand

Embedding Dimension: 1536

HNSW Index Memory Usage

There are two main components of the index: the vectors themselves and the neighbor lists. Per-vector neighbor lists are what makes the graph navigable. The number of neighbors per vector is controlled by the M hyperparameter. Therefore, the memory footprint of neighbor lists is directly proportional to this hyperparameter.

Larger values of M lead to higher recall at the expense of memory footprint and query time. So, generally, it is advisable to choose the lowest value of M that gives you the desired recall. The right value of M depends on the dataset - generally "harder" datasets will require larger values of M. For example, for 1M SIFT vectors, M = 5 is enough to achieve 99% recall @ 10, while for OpenAI's embeddings M > 10 is necessary.

It can be tedious and time-consuming to find the right hyperparameters when creating an index. The process is fully automated at Lantern Cloud, where we try out various parameters behind the scenes and recommend the best ones to you, through our dashboard.

Vector size and scalar quantization

For indexes with lower dimensional vectors, neighbor lists contribute a relatively larger portion of the memory footprint. For example, for 128 dimensional vectors and relatively low values of M (in 40-60 range), neighbor lists take up about half of the index memory footprint. This can be addressed by storing each neighbor index in fewer bits (more on this later).

On the other hand, for higher dimensional vectors (e.g. 1536 dimensional OpenAI vectors), raw vectors take the lion's share of the memory footprint. A relatively non-intrusive and simple way to reduce the memory footprint of vectors in this case is scalar quantization - instead of storing the large dimensional vectors in full precision floating point (32-bit) numbers, we can downcast them to smaller precision numbers (e.g. 8-bit integers) and store them in that format.

We repeat the earlier memory usage plot below. You can move the slider to see how the memory footprint of vectors changes as you lower the number of bits per vector element.


Vector element size (bits): 32
single precision

HNSW Index Memory Usage

Product quantization

With scalar quantization above, we reduced the memory footprint of vectors in the index. However, the resulting vectors are still quite large and often we can quantize them further without losing much recall. One way to do this is via Product Quantization (PQ). While in scalar quantization we compressed individual vector elements, here we compress subvectors of the original vector. The "Number of Subvectors" slider below controls how many subvectors we divide the original vector into.

To compress a subvector, we replace it by a centroid. Centroid is a vector that has the same dimensionality as the subvector. Centroids for each subvector are "trained" with the k-means algorithm. The number of centroids is the second parameter of PQ quantization and is controlled by another slider below.

Generally, larger number of subvectors leave relatively more training examples per subvector neighborhood, resulting in higher quality centroids. This comes at increased memory footprint. Larger number of centroids per subvector also increases memory footprint but makes each compressed vector more representative of the original.


Number of clusters: 256


Number of subvectors: 16

Product Quantization and HNSW Index Memory Usage

Reducing neighbor list footprint

We played many tricks to reduce the memory footprint of the vectors in the HNSW index. But so far we have not touched the memory footprint of the neighbor lists. And, as the footprint of raw vectors became smaller, neighbor lists started in the index became the dominant component of memory footprint.

Is there anything we can do about that?

Fortunately, yes! We can reduce bits used to represent each neighbor, thereby reducing neighbor list footprint. For example, if we have 5M vectors, then we can represent each neighbor with 22 bits, instead of our default 48. This trick becomes less useful as our dataset grows, however. With 1 billion vectors, we have to use at least 30 bit neighbor IDs to make sure all neighbors are addressible. Even with large datasets, we can leverage the structure of our data to partition a single index into multiple indexes and thereby reduce the number of bits needed to represent the neighbor lists in each index.

For example, if we can partition a 1 billion dataset into 1000 partitions, we can reduce neighbor index bits from 30 to 20.

There usually are natural ways to partition a huge dataset via metadata fields, so the above optimization is applicible in theory. For example, when indexing a large scientific libraries such as Arxiv or OpenAlex, one can partition the index by year of publication, topic, etc.

Postgres has great support for partial indexes, constrained by values of non-indexed columns, all of which Lantern inherits. So, this kind of partitioning is possible and painless in Lantern! Though we have not yet implemented neighbor id compression as described above, this is on our roadmap for supporting billion-scale datasets off of a single node. Hopefully, the graphs above convince you that even for billion-vector datasets, a single database node is likely enough to store your index.


Bits per neighbor ID in HNSW index: 48

Number of Index Partitions (a partition is chosen before vector operations): 1

Aside: Understanding the M hyperparameter

As the hyperparameter M plays such crucial role on HNSW index footprint, let's build some intuition on what it does.

HNSW stores our dataset of vectors in a hierarchical graph structure. The majority of the vectors are stored at the lowest level. At typical values of M (M >= 10), over 90% of all vectors are at the lowest level. There are exponentially fewer vectors at each higher level. At the lowest level, nodes have 2 * M neighbors. So, the vast majority of nodes have 2 * M neighbors and this is very close to average number of neighbors per node in the graph. At each higher level, nodes have an additional M neighbors. The barplot below shows the number of vectors at each level of the graph for your chosen dataset size and M.

Elements per level in HNSW index


Dataset Size (number of vectors): 913.16 thousand

M (in HNSW): 32

Conclusion

In this post we explored how various optimizations affect the memory footprint of an HNSW index. Stay tuned to see a similar walkthrough on how these optimizations affect query time and recall.

Authors

Narek Galstyan

Narek Galstyan

Cofounder

Share this post