PDX: A Transposed Data Layout for Similarity Search

PDX is a data layout that transposes vectors in a column-major order. This layout unleashes the true potential of dimension pruning, accelerating vanilla IVF indexes by factors:

PDX benefits:

⚡ Up to 10x faster IVF searches than FAISS+AVX512.
⚡ Up to 30x faster exhaustive search.
⚡ Up to 2x faster raw distance kernels.

Pruning in a nutshell

Pruning means avoiding checking all the dimensions of a vector to determine if it is a neighbour of a query. The PDX layout unleashes the true potential of these algorithms (e.g., ADSampling), accelerating vanilla IVF indexes by factors.

Pruning is especially effective for large embeddings (d > 512) and when targeting high recalls (> 0.90) or nearly exact results.

Down below, you will find benchmarks against FAISS.

Try PDX

Try PDX with your data using our Python bindings and examples. We have implemented PDX on Flat (float32) and Quantized (8-bit) IVF indexes and exhaustive search settings.

Prerequisites

PDX is available for x86_64 (with AVX512), ARM, and Apple silicon
Python 3.11 or higher
FAISS with Python Bindings
Clang++17 or higher
CMake 3.26 or higher

Installation Steps

Clone and init submodules

git clone https://github.com/cwida/PDX
git submodule init
git submodule update

[Optional] Install FFTW (for higher throughput)

wget https://www.fftw.org/fftw-3.3.10.tar.gz
tar -xvzf fftw-3.3.10.tar.gz
cd fftw-3.3.10
./configure --enable-float --enable-shared  
sudo make
sudo make install
ldconfig

Install Python dependencies and the bindings.

export CXX="/usr/bin/clang++-18"  # Set proper CXX first
pip install -r requirements.txt
python setup.py clean --all
python -m pip install .

Run the examples under /examples

# Creates an IVF index with FAISS on random data
# Then, it compares the search performance of PDXearch and FAISS
python ./examples/pdx_simple.py

For more details on the available examples and how to use your own data, refer to /examples/README.md.

Note

We heavily rely on FAISS to create the underlying IVF indexes.

Use Cases and Benchmarks

We present single-threaded benchmarks against FAISS+AVX512 on an r7iz.xlarge (Intel Sapphire Rapids) instance.

Two-Level IVF (IVF₂)

IVF₂ tackles a bottleneck of IVF indexes: finding the nearest centroids. By clustering the original IVF centroids, we can use PDX to quickly scan them (thanks to pruning) without sacrificing recall. This achieves significant throughput improvements when paired with 8-bit quantization.

Vanilla IVF

Here, PDX, paired with the pruning algorithm ADSampling on float32, achieves significant speedups.

Exhaustive search + IVF

An exhaustive search scans all the vectors in the collection. Having an IVF index with PDX can EXTREMELY accelerate this without sacrificing recall, thanks to the reliable pruning of ADSampling.

The key observation here is that thanks to the underlying IVF index, the exhaustive search starts with the most promising clusters. A tight threshold is found early on, which enables the quick pruning of most candidates.

Exhaustive search without an index

By creating random clusters with the PDX layout, you can still accelerate exhaustive search without an index. Unlike ADSampling, with BOND (our pruning algorithm), you can use the raw vectors. Gains vary depending on the distribution of the data.

No pruning and no index

Even without pruning, PDX distance kernels can be faster than SIMD ones in most CPU microarchitectures. For detailed information, check Figure 3 of our publication. You can also try it yourself in our playground here.

The Data Layout

PDX is a transposed layout (a.k.a. columnar, or decomposed layout), which means that the same dimensions of different vectors are stored sequentially. This decomposition occurs within a block (e.g., a cluster in an IVF index).

We have evolved our layout from the one presented in our publication to reduce random access, and adapted it to work with 8-bit and (in the future) 1-bit vectors.

`float32`

For float32, the first 25% of the dimensions are fully decomposed. We refer to this as the "vertical block." The rest (75%) are decomposed into subvectors of 64 dimensions. We refer to this as the "horizontal block." The vertical block is used for efficient pruning, and the horizontal block is accessed on the candidates that were not pruned. This horizontal block is still decomposed every 64 dimensions. The idea behind this is that we still have a chance to prune the few remaining candidates every 64 dimensions.

The following image shows this layout. Storage is sequential from left to right, and from top to bottom.

`8 bits`

Smaller data types are not friendly to PDX, as we must accumulate distances on wider types, resulting in asymmetry. We can work around this by changing the PDX layout. For 8 bits, the vertical block is decomposed every 4 dimensions. This allows us to use dot product instructions (VPDPBUSD in x86 and UDOT/SDOT in NEON) to calculate L2 or IP kernels while still benefiting from PDX. The horizontal block remains decomposed every 64 dimensions.

`binary`

For Hamming/Jaccard kernels, we use a layout decomposed every 8 dimensions (naturally grouped into bytes). The population count accumulation can be done in bytes. If d > 256, we flush the popcounts into a wider type every 32 words (corresponding to 256 dimensions). This has not been implemented in this repository yet, but you can find some promising benchmarks here.

Roadmap

Add PDX to the VIBE benchmark.
Benchmark filtering capabilities in PDX.
Add unit tests.
Implement multi-threading capabilities.
Adaptive quantization on 8-bit and 4-bit.
Create a documentation.

Important

PDX is an ongoing research project. In its current state, it is not production-ready code.

Benchmarking

To run our benchmark suite in C++, refer to BENCHMARKING.md.

Citation

If you use PDX for your research, consider citing us:

@article{kuffo2025pdx,
  title={PDX: A Data Layout for Vector Similarity Search},
  author={Kuffo, Leonardo and Krippner, Elena and Boncz, Peter},
  journal={Proceedings of the ACM on Management of Data},
  volume={3},
  number={3},
  pages={1--26},
  year={2025},
  publisher={ACM New York, NY, USA}
}

SIGMOD

The code used for the experiments presented at SIGMOD'25 can be found in the sigmod branch.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
benchmarks		benchmarks
examples		examples
extern		extern
include		include
python		python
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
BENCHMARKING.md		BENCHMARKING.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDX: A Transposed Data Layout for Similarity Search

PDX benefits:

Contents

Pruning in a nutshell

Try PDX

Prerequisites

Installation Steps

Use Cases and Benchmarks

Two-Level IVF (IVF₂)

Vanilla IVF

Exhaustive search + IVF

Exhaustive search without an index

No pruning and no index

The Data Layout

`float32`

`8 bits`

`binary`

Roadmap

Benchmarking

Citation

SIGMOD

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Languages

License

cwida/PDX

Folders and files

Latest commit

History

Repository files navigation

PDX: A Transposed Data Layout for Similarity Search

PDX benefits:

Contents

Pruning in a nutshell

Try PDX

Prerequisites

Installation Steps

Use Cases and Benchmarks

Two-Level IVF (IVF2)

Vanilla IVF

Exhaustive search + IVF

Exhaustive search without an index

No pruning and no index

The Data Layout

float32

8 bits

binary

Roadmap

Benchmarking

Citation

SIGMOD

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Languages

Two-Level IVF (IVF₂)

`float32`

`8 bits`

`binary`

Packages