RNA-Seq Data Annotator

A robust Python tool for annotating RNA sequencing data using standardized ontologies and classification systems. This tool provides automated annotation capabilities while maintaining strict adherence to established scientific ontologies.

Features

Support for multiple ontology formats (primarily OBO)
Automated annotation using Gene Ontology (GO) and Sequence Ontology (SO)
Comprehensive validation framework
Detailed logging system
Type-safe implementation
CSV output support
Extensible architecture for custom annotation rules
Docker support for containerized deployment
MongoDB integration for data persistence
Redis caching for improved performance

Prerequisites

Local Installation

Python 3.8+
pandas
typing
logging

Docker Deployment

Docker 20.10+
Docker Compose 2.0+
At least 8GB RAM
20GB free disk space

Installation

Local Installation

Clone the repository:

git clone https://github.com/imrobintomar/rna-seq-annotator.git
cd rna-seq-annotator

Install required packages:

pip install -r requirements.txt

Docker Deployment

Clone the repository:

git clone https://github.com/imrobintomar/rna-seq-annotator.git
cd rna-seq-annotator

Create necessary directories:

mkdir -p data ontologies output logs

Start the services:

docker-compose up -d

Verify services are running:

docker-compose ps

Usage

Local Usage

from rna_seq_annotator import RNASeqAnnotator

# Define ontology files
ontology_files = {
    'GO': 'path/to/gene_ontology.obo',
    'SO': 'path/to/sequence_ontology.obo'
}

# Initialize annotator
annotator = RNASeqAnnotator(ontology_files)

# Load and annotate sequences
sequence_data = pd.read_csv('your_rna_seq_data.csv')
annotated_data = annotator.annotate_sequence(
    sequence_data,
    required_ontologies=['GO', 'SO'],
    output_file='annotated_sequences.csv'
)

Docker Usage

Place your input files:

cp your_rna_seq_data.csv data/
cp gene_ontology.obo ontologies/
cp sequence_ontology.obo ontologies/

Run annotation:

docker-compose exec rna-seq-annotator python annotate.py \
    --input /app/data/your_rna_seq_data.csv \
    --output /app/output/annotated_sequences.csv

Access results:

ls output/

Docker Services

The Docker deployment includes three main services:

RNA-Seq Annotator

Main application container
Resource limits: 2 CPUs, 4GB RAM
Mounted volumes for data, ontologies, output, and logs

MongoDB

Persistent data storage
Secure authentication enabled
Port: 27017
Resource limits: 1 CPU, 2GB RAM

Redis

Caching layer
Password protected
Port: 6379
Resource limits: 0.5 CPU, 1GB RAM

Environment Variables

RNA-Seq Annotator

LOG_LEVEL: Logging level (default: INFO)
MAX_WORKERS: Number of worker processes (default: 4)
ONTOLOGY_DIR: Directory for ontology files
OUTPUT_DIR: Directory for output files

MongoDB

MONGO_INITDB_ROOT_USERNAME: MongoDB admin username
MONGO_INITDB_ROOT_PASSWORD: MongoDB admin password
MONGO_INITDB_DATABASE: Default database name

Redis

REDIS_PASSWORD: Redis authentication password

Input Data Format

The tool expects RNA-seq data in CSV format with the following columns:

sequence_id (required): Unique identifier for each sequence
sequence (required): The RNA sequence data
Additional metadata columns (optional)

Example:

sequence_id,sequence,tissue_type,condition
seq001,AUGCAUGCAUGC,liver,control
seq002,GCAUGCAUGCAU,liver,treated

Output Format

The tool generates annotated data in CSV format with additional columns for each ontology:

Original columns
GO_annotation: Gene Ontology annotations
SO_annotation: Sequence Ontology annotations
validation_status: Validation results (if validation is performed)

Customization

Adding New Ontologies

Prepare your ontology file in OBO format
Add the ontology to the ontology_files dictionary:

ontology_files = {
    'GO': 'gene_ontology.obo',
    'SO': 'sequence_ontology.obo',
    'YOUR_ONTOLOGY': 'your_ontology.obo'
}

Docker Customization

Modify docker-compose.yml to adjust:

Resource limits
Volume mounts
Environment variables
Network configuration

Logging

The tool provides comprehensive logging with different levels:

INFO: General operation information
WARNING: Non-critical issues
ERROR: Critical issues that need attention
DEBUG: Detailed information for debugging

Logs are available:

Local deployment: rna_seq_annotator.log
Docker deployment: /app/logs/rna_seq_annotator.log

Troubleshooting

Docker Issues

Check service status:

docker-compose ps

View logs:

docker-compose logs [service-name]

Restart services:

docker-compose restart [service-name]

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License

Citation

If you use this tool in your research, please cite:

@software{rna_seq_annotator,
  author = {Robin Tomar},
  title = {RNA-Seq Data Annotator},
  year = {2024},
  url = {https://github.com/imrobintomar/rna-seq-annotator}
}

Support

For support:

Open an issue in the GitHub repository
Contact [itsrobintomar@gmail.com]
Check the troubleshooting guide

Acknowledgments

Gene Ontology Consortium
Sequence Ontology Project
Contributors and maintainers of dependent packages
Docker and container ecosystem contributors

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
docker.yml		docker.yml
readme.md		readme.md
requirements.txt		requirements.txt
rna-seq-annotator.py		rna-seq-annotator.py

imrobintomar/rna-seq-annotator

Folders and files

Latest commit

History

Repository files navigation

RNA-Seq Data Annotator

Features

Prerequisites

Local Installation

Docker Deployment

Installation

Local Installation

Docker Deployment

Usage

Local Usage

Docker Usage

Docker Services

RNA-Seq Annotator

MongoDB

Redis

Environment Variables

RNA-Seq Annotator

MongoDB

Redis

Input Data Format

Output Format

Customization

Adding New Ontologies

Docker Customization

Logging

Troubleshooting

Docker Issues

Contributing

License

Citation

Support

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages