Title: Antibody Engineering Using Immunized Rabbit Spleen RNA Sequences
Objective: Identify unique antibody sequences, predict their structures, and evaluate their binding potential through docking studies.
Background: This project aims to discover novel antibody sequences from immunized rabbits, leveraging computational tools for extraction, filtering, clustering, and analysis of immunogenic sequences.
- Input Data: RNA sequences from immunized rabbit spleens and control samples.
- File Formats: FASTQ/FASTA.
- Source: [Specify source, e.g., sequencing platform].
Tool/Software | Version | Purpose |
---|---|---|
TRUST4 | [version] | Antibody sequence extraction |
MMseqs2 | [version] | Clustering and filtering sequences |
IgFold | [version] | Antibody structure prediction |
HADDOCK3 | [name] | Docking studies |
Python | [version] | Scripting for automation |
- Steps to clean and convert RNA sequences for analysis.
- Commands or scripts used for preprocessing.
- Parameters for running TRUST4 (e.g., alignment thresholds).
- Key outputs: CDR1, CDR2, CDR3 regions.
- Python script to parse and concatenate CDR regions.
- Example command:
python parse_cdrs.py input.fasta output.fasta
- Clustering:
mmseqs cluster inputDB outputDB tmpDir --min-seq-id 0.9
- Filtering:
mmseqs createsubdb targetDB inputDB filteredDB
- Explanation of the unique sequence selection process.
- Parameters and methods for IgFold.
- Input and output examples.
- Docking software setup and scoring metrics.
- Initial sequences: [number].
- Filtered sequences: [number].
- Unique sequences: [number].
- Number of clusters formed.
- Size distribution of clusters.
- Visualizations of antibody structures.
- Binding scores of top candidates.
Challenge | Solution |
---|---|
Computational bottlenecks during clustering | Parallel processing or cloud computing |
Ambiguities in CDR extraction | Improved regex patterns and manual validation |
- Total unique sequences: [number].
- Top-ranked candidates based on docking scores.
- Repository: [GitHub Link]
- Example README file:
## Usage 1. Preprocess sequences: `script1.py`. 2. Run TRUST4: `trust4 --parameters`. 3. Cluster sequences: `mmseqs cluster ...`.
- Automated pipeline using [Nextflow].
- Include diagram of workflow.
- Dependencies:
conda activate trust4 mmseqs2 igfold
- Experimental validation of top candidates.
- Extend study to other species or antibody libraries.
- Incorporate alternative docking algorithms for higher accuracy.
- TRUST4: [cite].
- MMseqs2: [cite].
- IgFold: [cite].