Quantum-Inspired Self-Attention in a Large Language Model

TL;DR: We replace the classical self‑attention in GPT‑1 with a quantum‑inspired attention mechanism, achieving logarithmic compression of some attention layer, approx eight times lower cross-entropy loss compared to standard self-attention, and only a ~2.1 longer inference time.

Tip

For a concise overview, see Section Methods (pp. X–Y) and Fig. 1 in the paper PDF.

Using QSA in Practice

To integrate Quantum Self‑Attention (QSA) into your own GPT‑1 training or inference pipeline, you only need:

QSA.py: the core QSA layer implementation, including both training (slow) and inference (fast) branches.
main.py: end‑to‑end example for training and evaluation with Hydra-based configuration.
conf/config.py: default hyperparameters and setup.

Overview

This repository provides:

Quantum Inspired Self‑Attention (QISA): v1, v2 and v3 implementations leveraging amplitude encoding and Pauli measurement for queries, keys, and values.
GPT-1 Integration: Drop‑in replacement of multi‑head self‑attention heads in the GPT‑1 architecture.
Two Execution Modes:
- Slow (Training): Step‑by‑step quantum simulation with TorchQuantum.
- Fast (Inference): Precomputed total unitary for rapid matrix‑vector application.

Key Components

Module	Description
QSA.py	Defines QSA, ValueLayerSlow, ValueLayerFast
main.py	Training and evaluation launcher via Hydra
conf/config.py	Experiment configurations (model, data, training)
model.py	GPT-1 model wrapper integrating QSA layers
dataset.py	Shakespeare dataset loader and tokenizer
utils/	Logging, metrics, and helper functions

Performance Highlights

Metric	CSA	QISA v2	QISA v3	AQSA v3 (w/ precomp)
Params per head	3×d×dₕ	O(log d)	2×d×dₕ + O(log d)	2×d×dₕ + O(log d)
Inference Time (T4 GPU)	1×	46.7×	8.9×	2.1× (with 22.3× speed vs. QISA v2)
Cross‑Entropy Loss	baseline	improves	improves	improves

Speed Comparison

Figure 1. Time per batch (batch size = 1024) on a single NVIDIA T4 GPU for CSA and different versions of QSA with embedding sizes ${4, 16}$. The fastest inference variant with unitaries and observables precomputation at QSAv3 achieves a 22.3$\times$ speed-up over the standard QSAv2 inference.

Performance

Figure 2. Training cross-entropy loss: CSA vs. QISA. Setup: 1 epoch, batch size = 128, 1 head, context length = 16, embedding size = 128, 7 qubits.

Set Up Environment

Software Dependencies

Tested on Ubuntu 22.04, Python 3.10+:

git clone https://github.com/Nikait/QISA.git
cd QISA
pip3 install requirements.txt

Note: For GPU acceleration, install a CUDA‑enabled PyTorch build.

Dataset Preparation

Download the Shakespeare text dataset from Kaggle:

   mkdir -p data && cd data
   wget https://www.kaggle.com/datasets/adarshpathak/shakespeare-text/download -O shakespeare.txt

Ensure conf/config.py points to data/shakespeare.txt and char_level tokenizer.

Running the Code

Training

Run a single training experiment with your config:

python3 main.py

Outputs (loss) will be saved under logs.txt/. Also you may add checkpoint saver by editing conf/config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quantum-Inspired Self-Attention in a Large Language Model

Using QSA in Practice

Table of Contents

Overview

Key Components

Performance Highlights

Speed Comparison

Performance

Set Up Environment

Software Dependencies

Dataset Preparation

Running the Code

Training

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
assets		assets
conf		conf
data		data
QSA.py		QSA.py
README.md		README.md
dataset.py		dataset.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt

Nikait/QISA

Folders and files

Latest commit

History

Repository files navigation

Quantum-Inspired Self-Attention in a Large Language Model

Using QSA in Practice

Table of Contents

Overview

Key Components

Performance Highlights

Speed Comparison

Performance

Set Up Environment

Software Dependencies

Dataset Preparation

Running the Code

Training

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages