A comprehensive web platform for RNA analysis, structure prediction, and interaction prediction, featuring an AI-powered design assistant with multimodal RAG capabilities.
RNA-Factory is an integrated platform that combines multiple state-of-the-art RNA analysis models with an intelligent AI assistant to provide comprehensive RNA research capabilities. The platform supports RNA secondary structure prediction, RNA-ligand interaction prediction, and offers an AI-powered design assistant with retrieval-augmented generation (RAG) for multimodal document processing.
The platform includes a sophisticated AI assistant powered by LangGraph that provides:
- Intelligent RNA Design Guidance: Expert assistance for RNA sequence design, structure optimization, and functional analysis
- Multimodal RAG System: Advanced retrieval-augmented generation supporting both text and image documents
- Document Processing: Automatic processing of PDFs, images, and text documents with OCR capabilities
- Contextual Knowledge: Access to extensive RNA research literature and databases
- Interactive Design Workflows: Step-by-step guidance for complex RNA design tasks
BPFold
- Deep learning model for RNA secondary structure prediction via base pair motif energy
- Supports canonical and non-canonical base pairs
- Provides confidence scoring and multiple output formats (CSV, BPSEQ, CT, DBN)
- GitHub | Paper
UFold
- Deep learning-based method using image-like sequence representation and Fully Convolutional Networks
- Fast inference (~160ms per sequence)
- Supports sequences up to 1600bp
- GitHub | Paper
MXFold2
- Deep learning-based method with thermodynamic integration
- High accuracy and fast prediction
- Supports long sequences
- GitHub | Paper
RNAformer
- Simple yet effective deep learning model using two-dimensional latent space
- Features axial attention mechanism and recycling in latent space
- High accuracy on benchmarks with single model approach
- GitHub | Paper
RNAmigos2
- Virtual screening tool for RNA-ligand interaction prediction using deep graph learning
- Ranks chemical compounds based on binding potential to RNA targets
- Fast inference (~10 seconds) with high enrichment factors
- GitHub | Paper
Reformer
- Deep learning model for predicting protein-RNA binding affinity at single-base resolution
- Uses transformer architecture with cDNA sequences for high-accuracy prediction
- Supports 150+ RBP types and multiple cell lines for comprehensive analysis
- Provides binding site identification and confidence scoring
- GitHub | Paper
CoPRA
- State-of-the-art predictor of protein-RNA binding affinity based on protein language model and RNA language model
- Uses ESM2 protein language model and RiNALMo RNA language model with complex structure as input
- Pre-trained on PRI30k dataset and fine-tuned on PRA310 for high accuracy
- Provides binding affinity prediction in kcal/mol with confidence scoring
- Supports protein and RNA sequence input for comprehensive interaction analysis
- GitHub | Paper
Mol2Aptamer
- Deep learning model for generating RNA aptamers from small molecule SMILES
- Uses transformer-based architecture with BPE tokenization
- Generates high-quality RNA sequences with thermodynamic validation
- Supports customizable generation parameters (temperature, top-k, top-p)
RNAFlow
- Flow matching model for protein-conditioned RNA sequence-structure design
- Integrates RNA inverse folding model and RoseTTAFold2NA
- Generates RNA sequences and structures conditioned on protein targets
- Supports customizable RNA length and sample generation
- GitHub | Paper
RNA-FrameFlow
- Flow matching model for de novo 3D RNA backbone design using SE(3) flow matching
- Generates high-quality 3D RNA backbone structures without sequence information
- Supports customizable structure length, sampling parameters, and generation settings
- Provides confidence scoring and trajectory files for analysis
- GitHub | Paper
RiboDiffusion
- Diffusion-based model for RNA inverse folding from protein structures
- Generates RNA sequences that can fold into target protein-bound conformations
- Uses conditional diffusion with protein structure conditioning
- Supports customizable sampling parameters and generation settings
- Provides recovery rate scoring and multiple sequence generation
- GitHub | Paper
- Multi-format Input Support: FASTA files, text input, mmCIF structures, PDB structures, SMILES strings, protein sequences, cDNA sequences, RNA sequences
- Unified Interface: Consistent user experience across all models with standardized input areas
- Real-time Processing: Fast analysis with progress tracking
- Multiple Output Formats: CT, BPSEQ, dot-bracket notation, CSV, PDB, and more
- Batch Processing: Support for multiple sequences, ligands, and protein targets
- Download Options: Individual files, ZIP archives, or CSV exports for batch results
- Smart File Upload: Intelligent file handling with content validation and format detection
- Adaptive UI: Dynamic input areas that adjust to content and file uploads
- Dark Mode Support: Complete dark theme with consistent styling across all components
- Responsive Design: Works on desktop and mobile devices
RNA-Factory/
βββ app/ # Main application package
β βββ __init__.py # Flask app factory and model configuration
β βββ api/ # API routes and endpoints
β β βββ bpfold_routes.py # BPFold API endpoints
β β βββ ufold_routes.py # UFold API endpoints
β β βββ mxfold2_routes.py # MXFold2 API endpoints
β β βββ rnaformer_routes.py # RNAformer API endpoints
β β βββ rnamigos2_routes.py # RNAmigos2 API endpoints
β β βββ reformer_routes.py # Reformer API endpoints
β β βββ copra_routes.py # CoPRA API endpoints
β β βββ mol2aptamer_routes.py # Mol2Aptamer API endpoints
β β βββ rnaflow_routes.py # RNAFlow API endpoints
β β βββ rnaframeflow_routes.py # RNA-FrameFlow API endpoints
β β βββ ribodiffusion_routes.py # RiboDiffusion API endpoints
β β βββ copilot_routes.py # AI assistant API endpoints
β β βββ model_config_routes.py # Model configuration endpoints
β βββ copilot/ # AI assistant and RAG system
β β βββ copilot.py # LangGraph-based AI assistant
β β βββ rag.py # Multimodal RAG system
β β βββ prompts.py # AI prompts and templates
β βββ static/ # Frontend assets
β β βββ index.html # Main web interface
β β βββ css/ # Stylesheets
β β βββ js/ # JavaScript functionality
β βββ utils/ # Utility modules
β βββ wrappers/ # Model wrapper classes
β βββ input.py # Input validation and processing
β βββ output.py # Output formatting and file generation
βββ models/ # Model directories and weights
βββ data/ # Sample data and documents
βββ config.py # Application configuration
βββ run.py # Application entry point
βββ pyproject.toml # Python dependencies
The AI assistant is built using LangGraph and provides:
- Query Classification: Automatically categorizes user queries (RNA design, general bioinformatics, off-topic)
- Tool Integration: Seamless integration with platform models and external tools
- Context Management: Maintains conversation context and user preferences
- Response Generation: Generates structured, actionable responses
The multimodal RAG system features:
- Document Processing: Supports PDF, image, and text documents
- OCR Capabilities: Extracts text from images and PDFs using Tesseract
- Vector Storage: Uses ChromaDB for efficient document retrieval
- Multimodal Embeddings: CLIP-based embeddings for image-text understanding
- Semantic Search: Advanced retrieval based on semantic similarity
Each model has a dedicated wrapper that:
- Environment Management: Handles virtual environment setup and activation
- Input Processing: Validates and preprocesses input data
- Model Execution: Runs model inference with proper error handling
- Output Parsing: Converts model outputs to standardized formats
RESTful API endpoints for:
- Model Predictions: Individual endpoints for each model
- File Processing: Upload and processing of various file formats
- Result Download: CT file generation and batch download
- AI Assistant: Chat interface and document processing
- Clone the repository
git clone https://github.com/your-username/RNA-Factory.git
cd RNA-Factory
- Install dependencies
pip install -e .
3. **Set up model environments**
```bash
# Each model requires its own virtual environment
# The platform will automatically set up environments on first use
- Run the application
python run.py
- Access the platform
Open your browser and navigate to
http://localhost:5000
- Select a structure prediction model (BPFold, UFold, MXFold2, or RNAformer)
- Input RNA sequences via text or upload FASTA files
- Run analysis and view results
- Download results in various formats (CT, BPSEQ, dot-bracket)
- Select RNAmigos2 for RNA-ligand interaction prediction
- Upload mmCIF structure file
- Specify binding site residues
- Input SMILES strings of ligands
- Run analysis to get interaction scores
- Select Reformer for protein-RNA binding affinity prediction
- Input cDNA sequence (ATCGN characters only)
- Select RBP (RNA-binding protein) type from 150+ options
- Choose cell line (HepG2, K562, or MCF-7)
- Run analysis to get single-base resolution binding scores
- Select CoPRA for protein-RNA binding affinity prediction
- Input protein sequence (single letter amino acid codes)
- Input RNA sequence (A, U, G, C only)
- Configure confidence threshold (Low/Medium/High/Very High)
- Run analysis to get binding affinity prediction in kcal/mol with confidence score
- Select Mol2Aptamer for aptamer generation
- Input small molecule SMILES string (via text or file upload)
- Configure generation parameters (number of sequences, max length, temperature, etc.)
- Run analysis to generate RNA aptamers
- View results with thermodynamic validation and download CSV files
- Select RNAFlow for protein-conditioned RNA design
- Input protein sequence (via text or file upload)
- Specify desired RNA length and number of samples
- Run analysis to generate RNA sequences and structures
- View results with confidence scores and download PDB structures
- Select RNA-FrameFlow for de novo 3D RNA backbone design
- Configure structure parameters (length, number of structures, temperature, random seed)
- Set advanced sampling parameters (timesteps, minimum time, exponential rate, self-conditioning)
- Run analysis to generate 3D RNA backbone structures
- View results with confidence scores and download PDB files with trajectory data
- Select RiboDiffusion for RNA inverse folding from protein structures
- Input PDB structure file (via text input or file upload)
- Configure generation parameters (number of sequences, sampling steps, conditional scale)
- Run analysis to generate RNA sequences that can fold into target conformations
- View results with recovery rate scores and download CSV files with generated sequences
- Access the AI assistant from the main interface
- Ask questions about RNA design, structure analysis, or general bioinformatics
- Upload documents for multimodal analysis
- Get expert guidance and recommendations
We welcome contributions to RNA-Factory! Please feel free to submit issues, feature requests, or pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
Author: Huaizhi Wang
Email: realwiseking@outlook.com
We thank the developers of the integrated models and the open-source community for their valuable contributions to RNA research and machine learning.