📜 Paper 💻 Usage 📚 Related Projects
TL;DR: We replace the classical self‑attention in GPT‑1 with a quantum‑inspired attention mechanism, achieving logarithmic compression of some attention layer, approx eight times lower cross-entropy loss compared to standard self-attention, and only a ~2.1 longer inference time.
Tip
For a concise overview, see Section Methods (pp. X–Y) and Fig. 1 in the paper PDF.
To integrate Quantum Self‑Attention (QSA) into your own GPT‑1 training or inference pipeline, you only need:
- QSA.py: the core QSA layer implementation, including both training (slow) and inference (fast) branches.
- main.py: end‑to‑end example for training and evaluation with Hydra-based configuration.
- conf/config.py: default hyperparameters and setup.
- Using QSA in Practice
- Table of Contents
- Overview
- Speed Comparison
- Performance
- Set Up Environment
- Running the Code
This repository provides:
- Quantum Inspired Self‑Attention (QISA): v1, v2 and v3 implementations leveraging amplitude encoding and Pauli measurement for queries, keys, and values.
- GPT-1 Integration: Drop‑in replacement of multi‑head self‑attention heads in the GPT‑1 architecture.
- Two Execution Modes:
- Slow (Training): Step‑by‑step quantum simulation with TorchQuantum.
- Fast (Inference): Precomputed total unitary for rapid matrix‑vector application.
Module | Description |
---|---|
QSA.py | Defines QSA, ValueLayerSlow, ValueLayerFast |
main.py | Training and evaluation launcher via Hydra |
conf/config.py | Experiment configurations (model, data, training) |
model.py | GPT-1 model wrapper integrating QSA layers |
dataset.py | Shakespeare dataset loader and tokenizer |
utils/ | Logging, metrics, and helper functions |
Metric | CSA | QISA v2 | QISA v3 | AQSA v3 (w/ precomp) |
---|---|---|---|---|
Params per head | 3×d×dₕ | O(log d) | 2×d×dₕ + O(log d) | 2×d×dₕ + O(log d) |
Inference Time (T4 GPU) | 1× | 46.7× | 8.9× | 2.1× (with 22.3× speed vs. QISA v2) |
Cross‑Entropy Loss | baseline | improves | improves | improves |
Figure 1. Time per batch (batch size = 1024) on a single NVIDIA T4 GPU for CSA and different versions of QSA with embedding sizes ${4, 16}$. The fastest inference variant with unitaries and observables precomputation at QSAv3 achieves a 22.3$\times$ speed-up over the standard QSAv2 inference.
Figure 2. Training cross-entropy loss: CSA vs. QISA. Setup: 1 epoch, batch size = 128, 1 head, context length = 16, embedding size = 128, 7 qubits.
Tested on Ubuntu 22.04, Python 3.10+:
git clone https://github.com/Nikait/QISA.git
cd QISA
pip3 install requirements.txt
Note: For GPU acceleration, install a CUDA‑enabled PyTorch build.
- Download the Shakespeare text dataset from Kaggle:
mkdir -p data && cd data
wget https://www.kaggle.com/datasets/adarshpathak/shakespeare-text/download -O shakespeare.txt
- Ensure conf/config.py points to data/shakespeare.txt and char_level tokenizer.
Run a single training experiment with your config:
python3 main.py
Outputs (loss) will be saved under logs.txt/. Also you may add checkpoint saver by editing conf/config.py