Skip to content

givemeone1astkiss/RNA-Factory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RNA-Factory

A comprehensive web platform for RNA analysis, structure prediction, and interaction prediction, featuring an AI-powered design assistant with multimodal RAG capabilities.

Overview

RNA-Factory is an integrated platform that combines multiple state-of-the-art RNA analysis models with an intelligent AI assistant to provide comprehensive RNA research capabilities. The platform supports RNA secondary structure prediction, RNA-ligand interaction prediction, and offers an AI-powered design assistant with retrieval-augmented generation (RAG) for multimodal document processing.

Features

πŸ€– AI Design Assistant

The platform includes a sophisticated AI assistant powered by LangGraph that provides:

  • Intelligent RNA Design Guidance: Expert assistance for RNA sequence design, structure optimization, and functional analysis
  • Multimodal RAG System: Advanced retrieval-augmented generation supporting both text and image documents
  • Document Processing: Automatic processing of PDFs, images, and text documents with OCR capabilities
  • Contextual Knowledge: Access to extensive RNA research literature and databases
  • Interactive Design Workflows: Step-by-step guidance for complex RNA design tasks

🧬 Supported Models

Structure Prediction Models

BPFold

  • Deep learning model for RNA secondary structure prediction via base pair motif energy
  • Supports canonical and non-canonical base pairs
  • Provides confidence scoring and multiple output formats (CSV, BPSEQ, CT, DBN)
  • GitHub | Paper

UFold

  • Deep learning-based method using image-like sequence representation and Fully Convolutional Networks
  • Fast inference (~160ms per sequence)
  • Supports sequences up to 1600bp
  • GitHub | Paper

MXFold2

  • Deep learning-based method with thermodynamic integration
  • High accuracy and fast prediction
  • Supports long sequences
  • GitHub | Paper

RNAformer

  • Simple yet effective deep learning model using two-dimensional latent space
  • Features axial attention mechanism and recycling in latent space
  • High accuracy on benchmarks with single model approach
  • GitHub | Paper

Interaction Prediction Models

RNAmigos2

  • Virtual screening tool for RNA-ligand interaction prediction using deep graph learning
  • Ranks chemical compounds based on binding potential to RNA targets
  • Fast inference (~10 seconds) with high enrichment factors
  • GitHub | Paper

Reformer

  • Deep learning model for predicting protein-RNA binding affinity at single-base resolution
  • Uses transformer architecture with cDNA sequences for high-accuracy prediction
  • Supports 150+ RBP types and multiple cell lines for comprehensive analysis
  • Provides binding site identification and confidence scoring
  • GitHub | Paper

CoPRA

  • State-of-the-art predictor of protein-RNA binding affinity based on protein language model and RNA language model
  • Uses ESM2 protein language model and RiNALMo RNA language model with complex structure as input
  • Pre-trained on PRI30k dataset and fine-tuned on PRA310 for high accuracy
  • Provides binding affinity prediction in kcal/mol with confidence scoring
  • Supports protein and RNA sequence input for comprehensive interaction analysis
  • GitHub | Paper

De Novo Design Models

Mol2Aptamer

  • Deep learning model for generating RNA aptamers from small molecule SMILES
  • Uses transformer-based architecture with BPE tokenization
  • Generates high-quality RNA sequences with thermodynamic validation
  • Supports customizable generation parameters (temperature, top-k, top-p)

RNAFlow

  • Flow matching model for protein-conditioned RNA sequence-structure design
  • Integrates RNA inverse folding model and RoseTTAFold2NA
  • Generates RNA sequences and structures conditioned on protein targets
  • Supports customizable RNA length and sample generation
  • GitHub | Paper

RNA-FrameFlow

  • Flow matching model for de novo 3D RNA backbone design using SE(3) flow matching
  • Generates high-quality 3D RNA backbone structures without sequence information
  • Supports customizable structure length, sampling parameters, and generation settings
  • Provides confidence scoring and trajectory files for analysis
  • GitHub | Paper

RiboDiffusion

  • Diffusion-based model for RNA inverse folding from protein structures
  • Generates RNA sequences that can fold into target protein-bound conformations
  • Uses conditional diffusion with protein structure conditioning
  • Supports customizable sampling parameters and generation settings
  • Provides recovery rate scoring and multiple sequence generation
  • GitHub | Paper

πŸ”§ Platform Capabilities

  • Multi-format Input Support: FASTA files, text input, mmCIF structures, PDB structures, SMILES strings, protein sequences, cDNA sequences, RNA sequences
  • Unified Interface: Consistent user experience across all models with standardized input areas
  • Real-time Processing: Fast analysis with progress tracking
  • Multiple Output Formats: CT, BPSEQ, dot-bracket notation, CSV, PDB, and more
  • Batch Processing: Support for multiple sequences, ligands, and protein targets
  • Download Options: Individual files, ZIP archives, or CSV exports for batch results
  • Smart File Upload: Intelligent file handling with content validation and format detection
  • Adaptive UI: Dynamic input areas that adjust to content and file uploads
  • Dark Mode Support: Complete dark theme with consistent styling across all components
  • Responsive Design: Works on desktop and mobile devices

Code Structure

RNA-Factory/
β”œβ”€β”€ app/                          # Main application package
β”‚   β”œβ”€β”€ __init__.py              # Flask app factory and model configuration
β”‚   β”œβ”€β”€ api/                     # API routes and endpoints
β”‚   β”‚   β”œβ”€β”€ bpfold_routes.py     # BPFold API endpoints
β”‚   β”‚   β”œβ”€β”€ ufold_routes.py      # UFold API endpoints
β”‚   β”‚   β”œβ”€β”€ mxfold2_routes.py    # MXFold2 API endpoints
β”‚   β”‚   β”œβ”€β”€ rnaformer_routes.py  # RNAformer API endpoints
β”‚   β”‚   β”œβ”€β”€ rnamigos2_routes.py  # RNAmigos2 API endpoints
β”‚   β”‚   β”œβ”€β”€ reformer_routes.py   # Reformer API endpoints
β”‚   β”‚   β”œβ”€β”€ copra_routes.py      # CoPRA API endpoints
β”‚   β”‚   β”œβ”€β”€ mol2aptamer_routes.py # Mol2Aptamer API endpoints
β”‚   β”‚   β”œβ”€β”€ rnaflow_routes.py    # RNAFlow API endpoints
β”‚   β”‚   β”œβ”€β”€ rnaframeflow_routes.py # RNA-FrameFlow API endpoints
β”‚   β”‚   β”œβ”€β”€ ribodiffusion_routes.py # RiboDiffusion API endpoints
β”‚   β”‚   β”œβ”€β”€ copilot_routes.py    # AI assistant API endpoints
β”‚   β”‚   └── model_config_routes.py # Model configuration endpoints
β”‚   β”œβ”€β”€ copilot/                 # AI assistant and RAG system
β”‚   β”‚   β”œβ”€β”€ copilot.py           # LangGraph-based AI assistant
β”‚   β”‚   β”œβ”€β”€ rag.py               # Multimodal RAG system
β”‚   β”‚   └── prompts.py           # AI prompts and templates
β”‚   β”œβ”€β”€ static/                  # Frontend assets
β”‚   β”‚   β”œβ”€β”€ index.html           # Main web interface
β”‚   β”‚   β”œβ”€β”€ css/                 # Stylesheets
β”‚   β”‚   └── js/                  # JavaScript functionality
β”‚   └── utils/                   # Utility modules
β”‚       β”œβ”€β”€ wrappers/            # Model wrapper classes
β”‚       β”œβ”€β”€ input.py             # Input validation and processing
β”‚       └── output.py            # Output formatting and file generation
β”œβ”€β”€ models/                      # Model directories and weights
β”œβ”€β”€ data/                        # Sample data and documents
β”œβ”€β”€ config.py                    # Application configuration
β”œβ”€β”€ run.py                       # Application entry point
└── pyproject.toml              # Python dependencies

Key Components

AI Assistant (app/copilot/)

The AI assistant is built using LangGraph and provides:

  • Query Classification: Automatically categorizes user queries (RNA design, general bioinformatics, off-topic)
  • Tool Integration: Seamless integration with platform models and external tools
  • Context Management: Maintains conversation context and user preferences
  • Response Generation: Generates structured, actionable responses

RAG System (app/copilot/rag.py)

The multimodal RAG system features:

  • Document Processing: Supports PDF, image, and text documents
  • OCR Capabilities: Extracts text from images and PDFs using Tesseract
  • Vector Storage: Uses ChromaDB for efficient document retrieval
  • Multimodal Embeddings: CLIP-based embeddings for image-text understanding
  • Semantic Search: Advanced retrieval based on semantic similarity

Model Wrappers (app/utils/wrappers/)

Each model has a dedicated wrapper that:

  • Environment Management: Handles virtual environment setup and activation
  • Input Processing: Validates and preprocesses input data
  • Model Execution: Runs model inference with proper error handling
  • Output Parsing: Converts model outputs to standardized formats

API Layer (app/api/)

RESTful API endpoints for:

  • Model Predictions: Individual endpoints for each model
  • File Processing: Upload and processing of various file formats
  • Result Download: CT file generation and batch download
  • AI Assistant: Chat interface and document processing

Getting Started

  1. Clone the repository
   git clone https://github.com/your-username/RNA-Factory.git
   cd RNA-Factory
  1. Install dependencies
    pip install -e .

3. **Set up model environments**
```bash
   # Each model requires its own virtual environment
   # The platform will automatically set up environments on first use
  1. Run the application
   python run.py
  1. Access the platform Open your browser and navigate to http://localhost:5000

Usage

Structure Prediction

  1. Select a structure prediction model (BPFold, UFold, MXFold2, or RNAformer)
  2. Input RNA sequences via text or upload FASTA files
  3. Run analysis and view results
  4. Download results in various formats (CT, BPSEQ, dot-bracket)

Interaction Prediction

RNAmigos2

  1. Select RNAmigos2 for RNA-ligand interaction prediction
  2. Upload mmCIF structure file
  3. Specify binding site residues
  4. Input SMILES strings of ligands
  5. Run analysis to get interaction scores

Reformer

  1. Select Reformer for protein-RNA binding affinity prediction
  2. Input cDNA sequence (ATCGN characters only)
  3. Select RBP (RNA-binding protein) type from 150+ options
  4. Choose cell line (HepG2, K562, or MCF-7)
  5. Run analysis to get single-base resolution binding scores

CoPRA

  1. Select CoPRA for protein-RNA binding affinity prediction
  2. Input protein sequence (single letter amino acid codes)
  3. Input RNA sequence (A, U, G, C only)
  4. Configure confidence threshold (Low/Medium/High/Very High)
  5. Run analysis to get binding affinity prediction in kcal/mol with confidence score

De Novo Design

Mol2Aptamer

  1. Select Mol2Aptamer for aptamer generation
  2. Input small molecule SMILES string (via text or file upload)
  3. Configure generation parameters (number of sequences, max length, temperature, etc.)
  4. Run analysis to generate RNA aptamers
  5. View results with thermodynamic validation and download CSV files

RNAFlow

  1. Select RNAFlow for protein-conditioned RNA design
  2. Input protein sequence (via text or file upload)
  3. Specify desired RNA length and number of samples
  4. Run analysis to generate RNA sequences and structures
  5. View results with confidence scores and download PDB structures

RNA-FrameFlow

  1. Select RNA-FrameFlow for de novo 3D RNA backbone design
  2. Configure structure parameters (length, number of structures, temperature, random seed)
  3. Set advanced sampling parameters (timesteps, minimum time, exponential rate, self-conditioning)
  4. Run analysis to generate 3D RNA backbone structures
  5. View results with confidence scores and download PDB files with trajectory data

RiboDiffusion

  1. Select RiboDiffusion for RNA inverse folding from protein structures
  2. Input PDB structure file (via text input or file upload)
  3. Configure generation parameters (number of sequences, sampling steps, conditional scale)
  4. Run analysis to generate RNA sequences that can fold into target conformations
  5. View results with recovery rate scores and download CSV files with generated sequences

AI Assistant

  1. Access the AI assistant from the main interface
  2. Ask questions about RNA design, structure analysis, or general bioinformatics
  3. Upload documents for multimodal analysis
  4. Get expert guidance and recommendations

Contributing

We welcome contributions to RNA-Factory! Please feel free to submit issues, feature requests, or pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Author: Huaizhi Wang
Email: realwiseking@outlook.com

Acknowledgments

We thank the developers of the integrated models and the open-source community for their valuable contributions to RNA research and machine learning.

About

Software for PekingHSC 2025.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published