Skip to content

Awesome-AI-Meets-Biology is a collection of state-of-the-art, novel, exciting LLMs and agents on biomedical and bioinformatic fields.

License

Notifications You must be signed in to change notification settings

Webioinfo01/Awesome-AI-Meets-Biology

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome AI Meets BiologyAwesome

Website GitHub Stars visitors

Visit the official website at: https://webioinfo01.github.io/Awesome-AI-Meets-Biology/

AI separated into "Foundation models" and "AI Agents" for biology/biomedical/bioinformatics research.

📋 Table of Contents

AI Agents

Year Title Team Team Website Affiliation Domain Venue Paper/ Source Code/Product
2025.07 PromptBio: A Multi-Agent AI Platform for Bioinformatics Data Analysis PromptBio(Xiao Yang) Link multiple agents system for bioinformatics bioRxiv Link Link
2025.07 GPTBioInsightor Single Cell annotation Link GitHub Stars
2025.06 NVIDIA Biomedical AI-Q Research Agent Developer Blueprint NVIDIA Link Drug Discovery (Target Identification) Nvidia's blog Link Link GitHub Stars
2025.06 OriGene: A Self-Evolving Virtual Disease Biologist Automating Therapeutic Target Discovery General Biomedical Research (therapeutic targets) bioRxiv Link Link GitHub Stars
2025.06 Agent Laboratory: Using LLM Agents as Research Assistants General Scientific Research arXiv Link Link GitHub Stars
2025.06 scExtract: leveraging large language models for fully automated single-cell RNA-seq data annotation and prior-informed multi-dataset integration scRNA downstream tasks Genome Biology Link Link GitHub Stars
2025.06 CellVoyager: AI CompBio Agent Generates New Insights by Autonomously Analyzing Biological Data James Zou Link single cell RNA bioRxiv Link Link GitHub Stars
2025.05 Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution Mengdi Wang Link General Scientific Research(self-evolution and create MCP) bioRxiv Link Link GitHub Stars
2025.05 BioOmni: A General-Purpose AI Agent for Automated Biomedical Research Marinka Zitnik Link General Biomedical Research (Multi-modal) bioRxiv Link Link GitHub Stars
2025.05 CellTypeAgent: Trustworthy cell type annotation with Large Language Models Yunjian Li LLM agent for cell type annotation in single-cell data ArXiv Link
2025.05 ChatMolData: A Multimodal Agent for Automatic Molecular Data Processing Xiaohui Yu Multimodal LLM-agent for automatic molecular data processing Advanced Intelligent Systems Link
2025.05 PlantGPT: An Arabidopsis-Based Intelligent Agent that Answers Questions about Plant Functional Genomics. Qinlong Zhu PlantGPT: LLM agent for plant functional genomics question answering Advanced science Link
2025.05 DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery Wenbin Hu LLM-based agent for parameterized reasoning in drug discovery ArXiv Link
2025.05 Automatic biomarker discovery and enrichment with BRAD I. Rajapakse LLM agent for automatic biomarker discovery and enrichment (BRAD) Bioinformatics Link
2025.04 The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search Sakana AI(David Ha) Link General Scientific Research ICLR 2025 Link Link GitHub Stars
2025.04 SpatialAgent: An autonomous AI agent for spatial biology Genentech(Aviv Regev) Link Spatial scRNA bioRxiv Link Link GitHub Stars
2025.04 Knowledge Graph and Large Language Model for Metabolomics Yuxing Lu Knowledge Graphs and LLMs in Metabolomics Link
2025.04 Spatial transcriptomics AI agent charts hPSC-pancreas maturation in vivo Jia Liu Harvard University AI Agent for Spatial Transcriptomics Analysis bioRxiv Link
2025.04 scAgent: Universal Single-Cell Annotation via a LLM Agent Yunjun Gao Universal Single-Cell Annotation with LLM Agent ArXiv Link
2025.04 Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data Single Cell annotation bioRxiv Link Link GitHub Stars
2025.04 SCassist: An AI Based Workflow Assistant for Single-Cell Analysis Rachel R Caspi National Institutes of Health Single Cell tasks(cluster, annotation) Bioinformatics Link Link GitHub Stars
2025.04 Fleming: An AI Agent for Antibiotic Discovery in Mycobacterium tuberculosis M. Farhat Harvard Medical School Tuberculosis antibiotic discovery AI agent bioRxiv Link
2025.04 OLAF: An Open Life Science Analysis Framework for Conversational Bioinformatics Powered by Large Language Models Diego Gonzalez Lopez Conversational bioinformatics pipeline ArXiv Link
2025.03 TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools Marinka Zitnik Link Personalized Treatment Recommendations arXiv Link Link GitHub Stars
2025.03 DrBioRight 2.0: an LLM-powered bioinformatics chatbot for large-scale cancer functional proteomics analysis proteomics Nature Communication Link Link
2025.03 DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration Drug Discovery arXiv Link Link
2025.03 PharmAgents: Building a Virtual Pharma with Large Language Model Agents Yanyan Lan Drug Discovery ArXiv Link
2025.03 AstroAgents: A Multi-Agent AI for Hypothesis Generation from Mass Spectrometry Data Amirali Aghazadeh Astrobiology & Mass Spectrometry AI ArXiv Link
2025.03 DORA AI Scientist: Multi-agent Virtual Research Team for Scientific Exploration Discovery and Automated Report Generation Alex Zhavoronkov Insilico Medicine Scientific Research Automation bioRxiv Link
2025.03 Design and Analysis of an Extreme-Scale, High-Performance, and Modular Agent-Based Simulation Platform Lukas Breitwieser Agent-Based Simulation ArXiv Link
2025.03 Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization Haishuai Wang Molecular Optimization ArXiv Link
2025.03 IAN: An Intelligent System for Omics Data Analysis and Discovery Rachel R. Caspi Laboratory of Immunology, National Eye Institute, NIH, Bethesda 20892, USA Omics Data Integration bioRxiv Link Link GitHub Stars
2025.02 LIDDIA: Language-based Intelligent Drug Discovery Agent Drug Discovery arXiv Link
2025.02 Towards an AI co-scientist General Scientific Research arXiv Link
2025.02 Knowledge Synthesis of Photosynthesis Research Using a Large Language Model Tae In Ahn Photosynthesis research assistant using LLM ArXiv Link
2025.02 Spike sorting AI agent Jia Liu John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, USA Spike sorting AI pipeline agent bioRxiv Link
2025.02 RAPID: Reliable and efficient Automatic generation of submission rePorting checklists with Large language moDels Lu Zhang Hong Kong Baptist University Automated medical reporting checklist generation bioRxiv Link
2025.01 BioMaster: Multi-agent System for Automated Bioinformatics Analysis Workflow Multi-omics Pipelines bioRxiv Link Link GitHub Stars
2025.01 InstructCell: A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following single cell RNA arXiv Link Link GitHub Stars
2025.01 BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems Microsoft Research(Venkat S. Malladi) Link Multi-omics Pipelines arXiv Link
2025.01 Extracting Knowledge from Scientific Texts on Patient-Derived Cancer Models Using Large Language Models: Algorithm Development and Validation G. Savova Boston Children's Hospital, Harvard Medical School LLM-based entity extraction for patient-derived cancer models bioRxiv Link
2024.12 CASSIA: a multi-agent large language model for reference free, interpretable, and automated cell annotation of single-cell RNA-sequencing data Single Cell annotation bioRxiv Link Link GitHub Stars
2024.12 ProtChat: An AI Multi-Agent for Automated Protein Analysis Leveraging GPT-4 and Protein Language Model Yunpeng Cai Automated protein analysis AI agent Journal of chemical information and modeling Link
2024.12 SCREADER: Prompting Large Language Models to Interpret scRNA-seq Data Meng Xiao scRNA-seq interpretation via LLM prompting 2024 IEEE International Conference on Data Mining Workshops (ICDMW) Link
2024.12 SciAgents: Automating Scientific Discovery Through Bioinspired Multi‐Agent Intelligent Graph Reasoning Markus J. Buehler Multi-agent AI for automated materials discovery Advanced Materials (Deerfield Beach, Fla.) Link
2024.12 Development and Application of an In Vitro Drug Screening Assay for Schistosoma mansoni Schistosomula Using YOLOv5 Antonio Muro AI-powered drug screening assay for schistosomula Biomedicines Link
2024.11 BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments Experimental Design (Genetic Perturbation) arXiv Link Link GitHub Stars
2024.11 The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation James Zou Link General Scientific Research bioRxiv Link Link GitHub Stars
2024.10 An AI Agent for Fully Automated Multi-Omic Analyses Multi-omics Pipelines Advanced Science Link Link GitHub Stars
2024.09 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Sakana AI(David Ha) Link General Scientific Research arXiv Link Link GitHub Stars
2024.07 CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis single cell RNA arXiv Link Link
2024.05 A Data-Intelligence-Intensive Bioinformatics Copilot System for Large-scale Omics Researches and Scientific Insights Multi-omics/single cell RNA bioRxiv Link Link GitHub Stars
2024.04 CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments Mengdi Wang, Le Cong Link Experimental Design (CRISPR Genetic Editing) bioRxiv Link Link GitHub Stars
2024.01 ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning Protein design and discovery arXiv Link Link GitHub Stars

Foundation models

Year Title Team Team Website Affiliation Domain Venue Paper/ Source Code/Product
2025.07 ODFormer: a Virtual Organoid for Predicting Personalized Therapeutic Responses in Pancreatic Cancer Organoid drug response prediction bioRxiv Link Link GitHub Stars
2025.07 Scalable emulation of protein equilibrium ensembles with generative deep learning Microsoft Research(Frank Noé) Link Protein Science Link Link GitHub Stars
2025.07 Spatia: Multimodal Model for Prediction and Generation of Spatial Cell Phenotypes Marinka Zitnik Link spatial single cell RNA ArXiv Link
2025.07 RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks RNA Nature Communication Link Link GitHub Stars
2025.06 World Models as Simulators of Patient Biology: Oncology Counterfactual Therapeutics Oracle (OCTO) Noetik Link spatial single cell RNA perturbation noetik's report Link
2025.06 Generalized biological foundation model with unified nucleic acid and protein language Alibaba(Zhaorong Li) Link DNA/RNA/Protein Nature Machine Intelligence Link Link GitHub Stars
2025.06 Predicting cellular responses to perturbation across diverse contexts with State Arc Institute(Yusuf H. Roohani) Link genetic, signaling, and chemical perturbation scRNA bioRxiv Link Link GitHub Stars
2025.06 AlphaGenome: Advancing regulatory variant effect prediction with a unified DNA sequence model Google DeepMind(Pushmeet Kohli) Link DNA/RNA bioRxiv Link Link GitHub Stars
2025.06 SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model DNA ICML 2025 Link Link GitHub Stars
2025.06 UniCure: A Foundation Model for Predicting Personalized Cancer Therapy Response drug perturbation scRNA bioRxiv Link Link GitHub Stars
2025.06 A multimodal conversational agent for DNA, RNA and protein tasks InstaDeep(Thomas Pierrot) Link Diverse Omics include DNA/RNA/Protein Nature Machine Intelligence Link Link
2025.05 A visual–omics foundation model to bridge histopathology with spatial transcriptomics Guangyu Wang Link histopathology and spatial single cell RNA Nature Methods Link Link GitHub Stars
2025.05 GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance Mengdi Wang Link Biosafety for DNA arXiv Link Link GitHub Stars
2025.05 BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model Bo Wang Link DNA arXiv Link Link GitHub Stars
2025.05 CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells single cell RNA Nature Communications Link Link GitHub Stars
2025.05 sciLaMA: A Single-Cell Representation Learning Framework to Leverage Prior Knowledge from Large Language Models G. Quon University of California, Davis Single-cell representation learning leveraging LLM prior knowledge bioRxiv Link
2025.05 LlamaAffinity: A Predictive Antibody–Antigen Binding Model Integrating Antibody Sequences with Llama3 Backbone Architecture J. Chen University of Alabama at Birmingham Antibody–antigen binding affinity prediction using LLM-based models bioRxiv Link
2025.05 GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype Stan Z. Li Graph representation learning for genetic perturbation integrating LLM and DNA features ArXiv Link
2025.04 Scaling Large Language Models for Next-Generation Single-Cell Analysis David van Dijk Link single cell RNA bioRxiv Link Link GitHub Stars
2025.04 CellFlow enables generative single-cell phenotype modeling with flow matching Fabian J. Theis Link Perturbation single cell RNA bioRxiv Link Link GitHub Stars
2025.04 Abstract 5084: Evaluation of single-cell foundation models for cancer outcome predictions Eliezer M Van Allen Single-cell Foundation Models for Cancer Outcome Prediction Cancer Research Link
2025.04 Abstract 6316: Predictive performance comparison of foundational and CNN models for single-cell immune profiling Mai Chan Lau Foundational vs. CNN Models for Immune Profiling in Histology Cancer Research Link
2025.04 Abstract 5059: Self-supervised representation learning of somatic mutational data Etai Jacob Self-supervised Representation Learning for Somatic Mutation Data Cancer Research Link
2025.04 dGeneralized Biological Foundation Model with Unified Nucleic Acid and Protein Language Zhaorong Li Alibaba Cloud Intelligence, Alibaba Group, Hangzhou, China Unified DNA/RNA/Protein foundation model bioRxiv Link
2025.04 Towards multimodal foundation models in molecular cell biology. Bo Wang Multimodal molecular biology foundation models Nature Link
2025.04 Abstract 3762: DEL-AI: Proteome-wide in silico screening of multi-billion compound libraries using machine learning foundation models Paul Novick Proteome-wide in silico DEL screening Cancer Research Link
2025.03 Multimodal AI predicts clinical outcomes of drug combinations from preclinical data Marinka Zitnik Clinical Pharmacology Modeling ArXiv Link
2025.02 Genome modeling and design across all domains of life with Evo 2 Arc Institute(Brian L. Hie) Link DNA bioRxiv Link Link GitHub Stars
2025.02 GENERator: A Long-Context Generative Genomic Foundation Model DNA arXiv Link Link GitHub Stars
2025.02 scGPT-spatial: Continual Pretraining of Single-Cell Foundation Model for Spatial Transcriptomics Bo Wang Link spatial single cell RNA bioRxiv Link Link GitHub Stars
2025.02 Large Cognition Model: Towards Pretrained EEG Foundation Model Aidan Hung-Wen Tsai EEG foundation model ArXiv Link
2025.02 Omni-DNA: A Unified Genomic Foundation Model for Cross-Modal and Multi-Task Learning Caihua Shan Cross-modal genomic foundation model ArXiv Link
2025.02 GENERator: A Long-Context Generative Genomic Foundation Model Zheng Wang Generative genomic foundation model ArXiv Link
2025.02 Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction S. Bozdag University of North Texas Drug–target interaction prediction framework bioRxiv Link
2025.02 AI-enabled alkaline-resistant evolution of protein to apply in mass production Liang Hong LLM-driven protein evolution for alkaline resistance eLife Link
2025.01 A foundation model of transcription across human cell types transcriptional regulation(ATAC-seq) Nature Link Link GitHub Stars
2025.01 GENA-LM: a new DNA language model for long sequences DNA Nucleic Acids Research Link Link GitHub Stars
2025.01 Function-Guided Conditional Generation Using Protein Language Models with Adapters Salesforce Research( Ali Madani) Link Protein arXiv Link Link GitHub Stars
2025.01 Simulating 500 million years of evolution with a language model EvolutionaryScale(Alexander Rives) Link Protein Science Link Link GitHub Stars
2025.01 Predicting cell morphological responses to perturbations using generative modeling Fabian J. Theis, Mohammad Lotfollahi Link phenotype morphological responses to perturbation Nature Communications Link Link GitHub Stars
2025.01 Improving functional protein generation via foundation model-derived latent space likelihood optimization César de la Fuente-Nunez University of Pennsylvania Generative protein design via PLM latent space optimization bioRxiv Link
2025.01 Unveiling the Evolution of Antimicrobial Peptides in Gut Microbes via Foundation Model-Powered Framework Jinfang Zheng Zhejiang Lab Antimicrobial peptide discovery from gut microbes bioRxiv Link
2025.01 Knowledge Hierarchy Guided Biological-Medical Dataset Distillation for Domain LLM Training Meng Xiao Biomedical dataset distillation for LLM training ArXiv Link
2024.12 L2G: Repurposing Language Models for Genomics Tasks Ameet Talwalkar Carnegie Mellon University Repurposing LLMs for genomics tasks bioRxiv Link
2024.12 Fine-Tuned Deep Transfer Learning Models for Large Screenings of Safer Drugs Targeting Class A GPCRs M. Filizola Icahn School of Medicine at Mount Sinai Deep transfer learning for GPCR drug screening bioRxiv Link
2024.12 Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs) G. Pollastri Protein secondary structure prediction with PLMs International Journal of Molecular Sciences Link
2024.12 ProtDAT: A Unified Framework for Protein Sequence Design from Any Protein Text Description Hong-Bin Shen Multi-modal protein sequence design from text ArXiv Link
2024.11 Nucleotide Transformer: building and evaluating robust foundation models for human genomics DNA Nature Methods Link Link GitHub Stars
2024.10 Orthrus: Towards Evolutionary and Functional RNA Foundation Models RNA bioRxiv Link Link GitHub Stars
2024.09 stFormer: a foundation model for spatial transcriptomics DNA bioRxiv Link Link GitHub Stars
2024.08 BioRAG: A RAG-LLM Framework for Biological Question Reasoning Life science question answer (RAG) arXiv Link
2024.07 DNA language model GROVER learns sequence context in the human genome DNA Nature Machine Intelligence Link Link
2024.06 scFoundation: Large-scale foundation model on single-cell transcriptomics Biomap research(Le Song, Xuegong Zhang) Link single cell RNA Nature Communication Link Link GitHub Stars
2024.06 Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models biomedical question answer (RAG) ISMB 24 Link Link GitHub Stars
2024.06 Multi-modal Transfer Learning between Biological Foundation Models InstaDeep(Guillaume Richard, Thomas Pierrot) Link Diverse Omics include DNA/RNA/Protein arXiv Link Link
2024.05 Accurate structure prediction of biomolecular interactions with AlphaFold 3 Google DeepMind(John M. Jumper) Link Protein Nature Link Link GitHub Stars
2024.04 Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology Recursion(Berton Earnshaw) Link microscopy data CVPR 2024 Highlight Link Link
2024.03 Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling DNA arXiv Link Link GitHub Stars
2024.03 Nicheformer: a foundation model for single-cell and spatial omics Fabian J. Theis Link single-cell and spatial omics bioRxiv Link Link GitHub Stars
2024.02 scGPT: toward building a foundation model for single-cell multi-omics using generative AI Bo Wang Link single cell RNA Nature Methods Link Link GitHub Stars
2024.01 OmniNA: A foundation model for nucleotide sequences DNA/RNA bioRxiv Link
2024.01 xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein Biomap research(Le Song) Link Protein arXiv Link Link GitHub Stars
2024.01 Geneverse: A Collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research DNA/Protein EMNLP 2024 Link Link GitHub Stars
2023.11 HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution DNA NeurIPS 2023 Link Link GitHub Stars
2023.10 ProGen2: Exploring the boundaries of protein language models Salesforce Research( Ali Madani) Link Protein Cell Systems Link Link GitHub Stars
2023.06 DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome DNA ICLR 2024 Link Link GitHub Stars
2023.05 Transfer learning enables predictions in network biology General scRNA tasks, discovery of key network regulators and candidate therapeutic targets Nature Link Link
2021.07 DNABERT: pre-trained Bidirectional Encoder Representations from Transformers for DNA-language DNA Bioinformatics Link Link GitHub Stars

AI Tools

Year Title Team Team Website Affiliation Domain Venue Paper/ Source Code/Product
2025.07 GPTBioInsightor Single Cell annotation Link GitHub Stars
2025.06 scExtract: leveraging large language models for fully automated single-cell RNA-seq data annotation and prior-informed multi-dataset integration scRNA downstream tasks Genome Biology Link Link GitHub Stars
2025.04 Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data Single Cell annotation bioRxiv Link Link GitHub Stars
2025.04 SCassist: An AI Based Workflow Assistant for Single-Cell Analysis Single Cell tasks(cluster, annotation) bioRxiv Link Link GitHub Stars
2024.12 CASSIA: a multi-agent large language model for reference free, interpretable, and automated cell annotation of single-cell RNA-sequencing data Single Cell annotation bioRxiv Link Link GitHub Stars

Databases/Simulation

Year Title Team Team Website Affiliation Domain Venue Paper/ Source Code/Product
2025.08 Drug Response Omics association MAp (DROMA, 卓玛) Biomarker discovery, Drug prediction,(18 projects with 2,600+ samples and 56,000+ drugs, with full ecosystem) Link GitHub Stars
2025.06 X-Atlas/Orion: Genome-wide Perturb-seq Datasets via a Scalable Fix-Cryopreserve Platform for Training Dose-Dependent Biological Foundation Models Genome-wide Perturb-seq Datasets bioRxiv Link Link
2025.05 scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready Yuanchun Zhou Multi-species scRNA-seq database for AI-ready applications Advanced Science Link
2025.05 AlphaLasso—a web server to identify loop and lasso motifs in 3D structure of biopolymers Joanna I. Sulkowska Web server for lasso motifs in biopolymer 3D structures (AlphaLasso) Nucleic Acids Research Link
2025.04 scMultiSim: simulation of single-cell multi-omics and spatial data guided by gene regulatory networks and cell–cell interactions Simulation of single-cell multi-omics and spatial data Nature Methods Link Link GitHub Stars
2025.04 uHAF: a unified hierarchical annotation framework for cell type standardization and harmonization Xuegong Zhang Hierarchical Framework for Cell Type Annotation and Harmonization Bioinformatics Link
2025.04 OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling Fuhai Li Text-omic signaling graph dataset ArXiv Link
2025.04 Abstract 1087: The evolving landscape of cancer transcriptomics data Akpéli V. Nordor Cancer transcriptomics data mapping Cancer Research Link
2025.03 RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy Recursion(Imran S. Haque) Link microscopy perturbation data ICLR 2025 Link Link
2025.02 scBaseCount: an AI agent-curated, uniformly processed, and continually expanding single cell data repository Arc Institute(Yusuf H. Roohani) Link preprocess scRNA data bioRxiv Link Link GitHub Stars
2025.02 Tahoe-100M: A Giga-Scale Single-Cell Perturbation Atlas for Context-Dependent Gene Function and Cellular Modeling tahoebio(but in arc's repository) Link Drug perturbation scRNA bioRxiv Link Link GitHub Stars
2025.02 Literature-scaled immunological gene set annotation using AI-powered immune cell knowledge graph (ICKG) Ken Chen MD Anderson Cancer Center Immune cell knowledge graph for gene set annotation bioRxiv Link
2024.12 M3-20M: A Large-Scale Multi-Modal Molecule Dataset for AI-driven Drug Design and Discovery Shuigeng Zhou Multi-modal molecule dataset for drug design Journal of bioinformatics and computational biology Link
2024.12 BioMedGraphica: An All-in-One Platform for Biomedical Prior Knowledge and Omic Signaling Graph Generation Fuhai Li Washington University in St. Louis Biomedical knowledge graph platform bioRxiv Link
2024.12 Basic Science and Pathogenesis. Li-San Wang AI-enhanced search for Alzheimer's genomic database Alzheimer's & dementia : the journal of the Alzheimer's Association Link
2024.06 Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics Marinka Zitnik Link Drug Discovery (Therapeutic science, 66 datasets) bioRxiv Link Link GitHub Stars
2024.04 Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations (Cell Painting Gallery) Broad Institute(Anne E. Carpenter) Link phenotype morphological responses to perturbation Nature Methods Link Link GitHub Stars
2024.01 scPerturb: harmonized single-cell perturbation data scRNA perturbation(44 public datasets, drugs and genes) Nature Methods Link Link GitHub Stars

Benchmarks

Year Title Team Team Website Affiliation Domain Venue Paper/ Source Code/Product
2025.06 Fundamental Limitations of Foundation Models in Single-Cell Transcriptomics evaluate cell-type classification bioRxiv Link
2025.06 Cell-Eval(Predicting cellular responses to perturbation across diverse contexts with State) Arc Institute(Yusuf H. Roohani) Link Perturbation scRNA bioRxiv Link Link GitHub Stars
2025.05 The influence of prompt engineering on large language models for protein–protein interaction identification in biomedical literature Yi-Hsuan Lin Prompt engineering of LLMs for protein-protein interaction extraction Scientific Reports Link
2025.05 CellVerse: Do Large Language Models Really Understand Cell Biology? P. Heng LLMs for language-driven single-cell multi-omics analysis (CellVerse) ArXiv Link
2025.05 Large Language Models Can Extract Metadata for Annotation of Human Neuroimaging Publications Jessica A Turner Ohio State University Wexner Medical Center, Columbus, OH, USA LLM extraction and annotation of neuroimaging metadata bioRxiv Link
2025.05 scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction Qianqian Song Benchmarking foundation models for single-cell drug response prediction ArXiv Link
2025.04 Zero-shot evaluation reveals limitations of single-cell foundation models Microsoft Research(Alex X. Lu) Link cell type clustering and batch integration for scGPT and Geneformer Genome Biology Link Link GitHub Stars
2025.04 OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling Fuhai Li Cell Text-Omic Signaling Graph Benchmark ArXiv Link
2025.04 Abstract 6316: Predictive performance comparison of foundational and CNN models for single-cell immune profiling Mai Chan Lau Single-cell immune profiling benchmark Cancer Research Link
2025.03 Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment Zhenguang G. Cai Cognitive Neuroscience & Audio-Language Modeling ArXiv Link
2025.02 BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning Zhi-Hong Deng Biological pathway reasoning benchmark ArXiv Link
2025.02 The Skin Game: Revolutionizing Standards for AI Dermatology Model Comparison Dariusz Jemielniak Dermatology image classification evaluation framework ArXiv Link
2025.02 Consequences of training data composition for deep learning models in single-cell biology Lorin Crawford Harvard Medical School Training data composition effects in single-cell models bioRxiv Link
2025.01 CARDBiomedBench: A Benchmark for Evaluating Large Language Model Performance in Biomedical Research F. Faghri Center for Alzheimer's and Related Dementias, National Institute on Aging, National Institutes of Health, Bethesda, MD, 20892, USA; DataTecnica, Washington, LLM evaluation in neurodegenerative disease research bioRxiv Link
2025.01 Large Language Models Think Too Fast To Explore Effectively Robert C. Wilson Exploration capabilities of LLMs in open-ended tasks ArXiv Link
2025.01 Sequence Modeling Is Not Evolutionary Reasoning M. Zitnik Harvard Medical School Evolutionary reasoning benchmark for protein LLMs bioRxiv Link
2024.12 PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis Perturbation scRNA NeurIPS 2024 Link Link GitHub Stars
2024.12 Does your model understand genes? A benchmark of gene properties for biological and text models Y. Shimoni Benchmark of gene property prediction models ArXiv Link

Reviews

Year Title Team Team Website Affiliation Domain Venue Paper/ Source Code/Product
2025.07 Human interpretable grammar encodes multicellular systems biology models to democratize virtual cell laboratories Paul Macklin Indiana University Virtual cell Cell Link Link GitHub Stars
2025.07 The generative era of medical AI Pranav Rajpurkar Harvard Medical School Medical AI Cell Link
2025.05 Empowering Biomedical Research with Foundation Models in Computational Microscopy: A Systematic Review Rong Luo Foundation models in computational microscopy Advanced Intelligent Systems Link
2025.04 Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions MCP for agents arXiv Link
2025.04 Towards multimodal foundation models in molecular cell biology. Bo Wang Multimodal Foundation Models in Molecular Cell Biology Nature Link
2025.04 A Survey of the Model Context Protocol (MCP): Standardizing Context to Enhance Large Language Models (LLMs) MCP for agents Preprints Link
2025.04 Decoding cancer prognosis with deep learning: the ASD-cancer framework for tumor microenvironment analysis J. Haran Deep Learning for Tumor Microenvironment Prognosis mSystems Link
2025.03 Large language model for knowledge synthesis and AI-enhanced biomanufacturing. Yinjie J. Tang Biomanufacturing Trends in biotechnology Link
2025.03 Biological Sequence with Language Model Prompting: A Survey Yu Li Sequence Analysis ArXiv Link
2025.03 AI-Empowered Genome Decoding: Applications of Large Language Models in Genomics Yu Zhou Genomics Frontiers of Digital Education Link
2025.03 Large Language Models in Bioinformatics: A Survey Yu Li Bioinformatics ArXiv Link
2025.03 A Conceptual Framework for Human-AI Collaborative Genome Annotation Radoslaw Suchecki Genome Annotation ArXiv Link
2025.02 Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective Yu Takagi LLM learning dynamics from a neuroscience perspective ArXiv Link
2025.02 Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents Mariya Toneva Episodic memory framework for long-term LLM agents ArXiv Link
2025.02 Microbial Ecology to Ocean Carbon Cycling: From Genomes to Numerical Models E. Zakem Integration of microbial ecology and numerical models for ocean carbon cycling Annual Review of Earth and Planetary Sciences Link
2025.01 Survey and Improvement Strategies for Gene Prioritization with Large Language Models Xia Hu LLM-based gene prioritization ArXiv Link
2025.01 Computational Protein Science in the Era of Large Language Models (LLMs) Qing Li Computational protein science with LLMs ArXiv Link
2025.01 Foundation models in bioinformatics Jianxin Wang Bioinformatics foundation models overview National Science Review Link
2025.01 AI Methods for Antimicrobial Peptides: Progress and Challenges César de la Fuente-Nunez AI methods for antimicrobial peptide design Microbial Biotechnology Link
2025.01 Artificial intelligence driven innovations in biochemistry: A review of emerging research frontiers M. A. Lateef Junaid AI innovations in biochemistry Biomolecules and Biomedicine Link
2025.01 Artificial Intelligence Tools Addressing Challenges of Cancer Progression Due to Antimicrobial Resistance in Pathogenic Biofilm Systems Abhijit G. Banerjee AI tools for antimicrobial resistance in cancer biofilms Artificial Intelligence Evolution Link
2025.01 Learning the language of life with AI. E. Topol Multiomic foundation models for biomolecule prediction Science Link
2024.12 Large language models facilitating modern molecular biology and novel drug development Fei Liu Review of LLMs in molecular biology and drug development Frontiers in Pharmacology Link
2024.12 From multi-omics to predictive biomarker: AI in tumor microenvironment Yingli Sun Review of AI in tumor microenvironment multi-omics Frontiers in Immunology Link
2024.12 Advancements and Applications of Protein Structure Prediction Algorithms Ye Chen Review of protein structure prediction methods Theoretical and Natural Science Link
2024.10 Empowering biomedical discovery with AI agents Biomedical Agents Cell Link
2024.08 Transformers in single-cell omics: a review and new perspectives Single cell foundation models Nature Methods Link
2024.08 Language models for biological research: a primer biological foundation models Nature Methods Link
2024.07 Foundation models for bioinformatics Bioinformatics foundation models Quantitative Biology Link

Other Awesome Projects

Title Project
Awesome-LLMs-meet-genomes Link GitHub Stars
Awesome-Virtual-Cell Link GitHub Stars
Awesome-Phenotypic-Drug-Discovery Link GitHub Stars
Awesome-LLM-Scientific-Discovery Link GitHub Stars
Awesome bioagent papers Link GitHub Stars
Awesome LLM Agents for Scientific Discovery Link GitHub Stars
Awesome Papers on Agents for Science Link GitHub Stars
awesome-single-cell Link GitHub Stars
Awesome-Bioinformatics Link GitHub Stars
awesome Link GitHub Stars

Others

Ref

About

Awesome-AI-Meets-Biology is a collection of state-of-the-art, novel, exciting LLMs and agents on biomedical and bioinformatic fields.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •