GenAIRR

Adaptive Immune Receptor Repertoire sequence simulator
Generate realistic BCR & TCR repertoires in a single line of Python.

📑 Table of Contents

Why GenAIRR?
Key Features
Installation
Quick Start
Examples
Mutation Models
Roadmap
Contributing
Citing GenAIRR
License
Acknowledgements

🧐 Why GenAIRR?

Click to expand

Benchmarking modern aligners, exploring somatic-hypermutation, or stress-testing novel ML pipelines requires large, perfectly-annotated repertoires—not snippets of real data peppered with sequencing error.
GenAIRR fills that gap with a plug-and-play, fully-extensible simulation engine that produces sequences while giving you full ground-truth labels.

✨ Key Features

Category	Highlights
Realistic Simulation	Context-aware S5F mutations, indels, allele-specific trimming, NP-region modelling
Composable Pipelines	Chain together built-in & custom `AugmentationStep`s into simulation pipelines
Multi-Chain Support	Heavy & light BCRs plus TCR-β out of the box
Research-ready Output	JSON / pandas export, built-in plotting stubs, deterministic seeds
Docs & Tutorials	Rich API docs, Jupyter notebooks, step-by-step guides

⚡ Installation

# Python ≥ 3.9
pip install GenAIRR
# or the bleeding edge
pip install git+https://github.com/MuteJester/GenAIRR.git

🚀 Quick Start

Below is a 60-second tour. See /examples for notebooks and CLI usages.

from GenAIRR.pipeline import AugmentationPipeline
from GenAIRR.steps import SimulateSequence, FixVPositionAfterTrimmingIndexAmbiguity
from GenAIRR.mutation import S5F
from GenAIRR.data import HUMAN_IGH_OGRDB
from GenAIRR.steps.StepBase import AugmentationStep

# 1️⃣  Configure built-in germline data
AugmentationStep.set_dataconfig(HUMAN_IGH_OGRDB)

# 2️⃣  Build a minimal pipeline
pipeline = AugmentationPipeline([
    SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),
    FixVPositionAfterTrimmingIndexAmbiguity()
])

# 3️⃣  Simulate!
sim = pipeline.execute()
print(sim.get_dict())

🧑‍💻 Examples

1. Full Heavy-Chain Pipeline

from GenAIRR.steps import (
    FixDPositionAfterTrimmingIndexAmbiguity, FixJPositionAfterTrimmingIndexAmbiguity,
    CorrectForVEndCut, CorrectForDTrims, CorruptSequenceBeginning,
    InsertNs, InsertIndels, ShortDValidation, DistillMutationRate
)

pipeline = AugmentationPipeline([
    SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), True),
    FixVPositionAfterTrimmingIndexAmbiguity(),
    FixDPositionAfterTrimmingIndexAmbiguity(),
    FixJPositionAfterTrimmingIndexAmbiguity(),
    CorrectForVEndCut(),
    CorrectForDTrims(),
    CorruptSequenceBeginning(0.7, [0.4, 0.4, 0.2], 576, 210, 310, 50),
    InsertNs(0.02, 0.5),
    ShortDValidation(),
    InsertIndels(0.5, 5, 0.5, 0.5),
    DistillMutationRate()
])
result = pipeline.execute()

2. Naïve Sequence (no SHM)

from GenAIRR.mutation import Uniform
naive_step = SimulateSequence(Uniform(0, 0), True)
pipeline = AugmentationPipeline([naive_step])
naive_seq = pipeline.execute()
print(naive_seq.sequence)

3. Custom Allele Combination

custom_step = SimulateSequence(
    S5F(0.003, 0.25),
    True,
    specific_v=HUMAN_IGH_OGRDB.v_alleles['IGHV1-2*02'][0],  # specific V allele
    specific_d=HUMAN_IGH_OGRDB.d_alleles['IGHD3-10*01'][0], # specific D allele  
    specific_j=HUMAN_IGH_OGRDB.j_alleles['IGHJ4*02'][0]     # specific J allele
)
pipeline = AugmentationPipeline([custom_step])
print(pipeline.execute().get_dict())

🔬 Mutation Models

Model	Description	When to use
`S5F`	Context-specific somatic hyper-mutation	Antibody maturation studies
`Uniform`	Evenly random mutations	Baselines / ablation
Your Model ➕	Implement `BaseMutationModel`	Custom evolutionary scenarios

from GenAIRR.mutation import S5F
s5f = S5F(min_mutation_rate=0.01, max_mutation_rate=0.05)
mut_seq, muts, rate = s5f.apply_mutation(naive_seq)

🗺️ Roadmap

🚧 More Complex Mutation Model (With Selection)
🚧 More Built-in Data Configs (e.g., TCR, custom germlines)
🚧 More Built-in Steps (e.g., more mutation models, more data augmentation)
🚧 Deeper Docs (e.g., more examples, more tutorials)

See the open issues. Feel something’s missing? Open a feature request.

🤝 Contributing

Contributions are welcome! 💙 Please read our contributing guide and check the good first issue label.

✏️ Citing GenAIRR

If GenAIRR helps your research, please cite:

Konstantinovsky T, Peres A, Polak P, Yaari G.  
An unbiased comparison of immunoglobulin sequence aligners.
Briefings in Bioinformatics. 2024 Sep 23; 25(6): bbae556.  
https://doi.org/10.1093/bib/bbae556  
PMID: 39489605 | PMCID: PMC11531861

📜 License

Distributed under the GPL3 License. See LICENSE for details.

🙏 Acknowledgements

GenAIRR is inspired by and builds upon amazing work from the immunoinformatics community—especially AIRRship.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
.github/workflows		.github/workflows
docs		docs
src/GenAIRR		src/GenAIRR
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
debug.py		debug.py
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GenAIRR

📑 Table of Contents

🧐 Why GenAIRR?

✨ Key Features

⚡ Installation

🚀 Quick Start

🧑‍💻 Examples

1. Full Heavy-Chain Pipeline

2. Naïve Sequence (no SHM)

3. Custom Allele Combination

🔬 Mutation Models

🗺️ Roadmap

🤝 Contributing

✏️ Citing GenAIRR

📜 License

🙏 Acknowledgements

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 3

Languages

License

MuteJester/GenAIRR

Folders and files

Latest commit

History

Repository files navigation

GenAIRR

📑 Table of Contents

🧐 Why GenAIRR?

✨ Key Features

⚡ Installation

🚀 Quick Start

🧑‍💻 Examples

1. Full Heavy-Chain Pipeline

2. Naïve Sequence (no SHM)

3. Custom Allele Combination

🔬 Mutation Models

🗺️ Roadmap

🤝 Contributing

✏️ Citing GenAIRR

📜 License

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 3

Languages

Packages