Generic Questions on GPU Indexing Speed #41721

0-EricZhou-0 · 2025-05-08T23:14:38Z

0-EricZhou-0
May 8, 2025

GPU Indexing

In Milvus v2.5.x documentation GPU Index, it shows that GPUs can help increase the throughput of indexing. However, I did some experiments on my local setup, and the indexing speed for the GPU algorithms seems to be almost the same as the CPU ones.

Server Configuration

The server I used has the following configuration:

CPU: Dual-socket AMD EPYC 9334 32-Core Processor.
Memory: 1.48TB
GPU: Two NVIDIA H100 NVL

Milvus Setup

The Milvus we used is version 2.5.9, with setup following the documents here.

Workload Description

The dataset we used contains around 28.5M entries, we only inserted the embeddings along with the unique ID into the database.

Experiment Process

Using the pymilvus interface, we created 32000 requests in total beforehand, performing vector search by using 32 threads to send requests in parallel. We batch 100 searches in each of the requests.

Profiling Results

Basic Tests with one GPU

The following experiments are done with one of the H100 used and it seems that the GPU indexing does not accelerate the query speed much.

Increasing vector dimension (still one GPU used)

We observed that the GPU occupation is quite low and thought that it might be because there is not enough computation compared to graph searching, so we proceeded to conduct the experiment with larger vector dimensions and limited the number of CPU cores. We limit the number of cores that can be used by the standalone service to 32 by changing the following fields in docker-compose.yml.

services:
  standalone:
    deploy:
      resources:
        ...
        limits:         # added
          cpus: "32.0"  # added

The results are as follows:

Increasing the number of GPUs used

We also tried to use multiple GPUs, but it seems like the performance is almost the same (even a little bit slower) with multiple GPUs. In the docker setup, we changed the following field in docker-compose.yml

services:
  standalone:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: ["GPU"]
              device_ids: ["0", "1"] # was device_ids: ["1"]

The results are as follows:

Questions

It seems that the GPU is not helping much on indexing speed, is there anything that I wrongly configured?
Why increasing the number of GPUs that can be used does not increases query throughput? Are there any knobs we need to tune in order for the performance to be better?

yhmo · 2025-05-09T02:22:59Z

yhmo
May 9, 2025
Collaborator

In Milvus, search() interface allows users to input a vector list to search:

results = client.search(collection_name=collection_name,
                        data=[vector1, vector2, vector3, .....],
                        anns_field="vector",
                        limit=100)

We call the length of the vector list "nq".
As we know, the major work in ANN search is distance computation between vectors. With an index, the workload is reduced since most of the vectors are skipped. If you test with a small nq, GPU usage is low since the workload is small.
GPU is faster than CPU in parallel computation, but it also has an overhead of copying data between GPU memory and CPU memory.

So, there is not much difference if you test with a small nq. You can try a large nq=1000

results = client.search(collection_name=collection_name,
                        data=[vector1, vector2, vector3, ....., vector1000],
                        anns_field="vector",
                        limit=100)

0 replies

xiaofan-luan · 2025-05-09T07:46:21Z

xiaofan-luan
May 9, 2025
Maintainer

GPU索引

Milvus v2.5.x 文档GPU 索引中提到，GPU 可以帮助提升索引的吞吐量。不过，我在本地配置上做了一些实验，GPU 算法的索引速度似乎与 CPU 算法几乎相同。

服务器配置

我使用的服务器配置如下：

CPU：双插槽 AMD EPYC 9334 32 核处理器。

内存：1.48TB

GPU：两块 NVIDIA H100 NVL

Milvus 设置

我们使用的 Milvus 是 2.5.9 版本，设置按照此处的文档进行。

工作负载描述

我们使用的数据集包含大约 28.5M 个条目，我们仅将嵌入连同唯一 ID 一起插入数据库。

实验过程

通过pymilvus接口，我们预先创建了总共32000个请求，使用32个线程并行发送请求进行向量搜索，每个请求中批量执行100个搜索。

分析结果

使用一个 GPU 进行基本测试

以下实验是使用其中一台 H100 进行的，看起来 GPU 索引并没有显著加快查询速度。

增加矢量维度（仍然使用一个 GPU）

我们观察到 GPU 占用率很低，并认为这可能是因为与图搜索相比计算量不足，因此我们继续使用更大的向量维度进行实验，并限制了 CPU 核心的数量。我们standalone通过更改中的以下字段将服务可使用的核心数量限制为 32 个docker-compose.yml。
services:
  standalone:
    deploy:
      resources:
        ...
        limits:         # added
          cpus: "32.0"  # added
结果如下：

增加使用的 GPU 数量

我们也尝试过使用多个 GPU，但看起来性能几乎一样（甚至略慢）。在 Docker 设置中，我们更改了以下字段：docker-compose.yml
services:
  standalone:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: ["GPU"]
              device_ids: ["0", "1"] # was device_ids: ["1"]
结果如下：

问题

看起来 GPU 对索引速度没有太大帮助，是不是我配置错了？

为什么增加可用的 GPU 数量并不能提高查询吞吐量？为了提高性能，我们需要调整哪些参数？

GPU Indexing

In Milvus v2.5.x documentation GPU Index, it shows that GPUs can help increase the throughput of indexing. However, I did some experiments on my local setup, and the indexing speed for the GPU algorithms seems to be almost the same as the CPU ones.

Server Configuration

The server I used has the following configuration:

CPU: Dual-socket AMD EPYC 9334 32-Core Processor.

Memory: 1.48TB

GPU: Two NVIDIA H100 NVL

Milvus Setup

The Milvus we used is version 2.5.9, with setup following the documents here.

Workload Description

The dataset we used contains around 28.5M entries, we only inserted the embeddings along with the unique ID into the database.

Experiment Process

Using the pymilvus interface, we created 32000 requests in total beforehand, performing vector search by using 32 threads to send requests in parallel. We batch 100 searches in each of the requests.

Profiling Results

Basic Tests with one GPU

The following experiments are done with one of the H100 used and it seems that the GPU indexing does not accelerate the query speed much.

Increasing vector dimension (still one GPU used)

We observed that the GPU occupation is quite low and thought that it might be because there is not enough computation compared to graph searching, so we proceeded to conduct the experiment with larger vector dimensions and limited the number of CPU cores. We limit the number of cores that can be used by the standalone service to 32 by changing the following fields in docker-compose.yml.
services:
  standalone:
    deploy:
      resources:
        ...
        limits:         # added
          cpus: "32.0"  # added
The results are as follows:

Increasing the number of GPUs used

We also tried to use multiple GPUs, but it seems like the performance is almost the same (even a little bit slower) with multiple GPUs. In the docker setup, we changed the following field in docker-compose.yml
services:
  standalone:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: ["GPU"]
              device_ids: ["0", "1"] # was device_ids: ["1"]
The results are as follows:

Questions

It seems that the GPU is not helping much on indexing speed, is there anything that I wrongly configured?

Why increasing the number of GPUs that can be used does not increases query throughput? Are there any knobs we need to tune in order for the performance to be better?

Did you check the GPU memory usage? We want to make sure the GPU index do work as expected.
Could you increase the read concurrency and see if it increase the QPS
use large NQ is also a way to increase throughput of GPU index.
From our observation, GPU is usually only useful under large batch, which might not be useful for most of the low latency use cases.

Scenario good for GPU index:

High Concurrency Recommendation Systems.
Offline large batch calculation

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generic Questions on GPU Indexing Speed #41721

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

GPU索引

服务器配置

Milvus 设置

工作负载描述

实验过程

分析结果

使用一个 GPU 进行基本测试

增加矢量维度（仍然使用一个 GPU）

增加使用的 GPU 数量

问题

GPU Indexing

Server Configuration

Milvus Setup

Workload Description

Experiment Process

Profiling Results

Basic Tests with one GPU

Increasing vector dimension (still one GPU used)

Increasing the number of GPUs used

Questions

Select a reply

Uh oh!

Generic Questions on GPU Indexing Speed #41721

Uh oh!

0-EricZhou-0 May 8, 2025

GPU Indexing

Server Configuration

Milvus Setup

Workload Description

Experiment Process

Profiling Results

Basic Tests with one GPU

Increasing vector dimension (still one GPU used)

Increasing the number of GPUs used

Questions

Replies: 2 comments

Uh oh!

yhmo May 9, 2025 Collaborator

Uh oh!

xiaofan-luan May 9, 2025 Maintainer

GPU索引

服务器配置

Milvus 设置

工作负载描述

实验过程

分析结果

使用一个 GPU 进行基本测试

增加矢量维度（仍然使用一个 GPU）

增加使用的 GPU 数量

问题

GPU Indexing

Server Configuration

Milvus Setup

Workload Description

Experiment Process

Profiling Results

Basic Tests with one GPU

Increasing vector dimension (still one GPU used)

Increasing the number of GPUs used

Questions

0-EricZhou-0
May 8, 2025

yhmo
May 9, 2025
Collaborator

xiaofan-luan
May 9, 2025
Maintainer