Awesome AI Meets Biology

Visit the official website at: https://webioinfo01.github.io/Awesome-AI-Meets-Biology/

AI separated into "Foundation models" and "AI Agents" for biology/biomedical/bioinformatics research.

📋 Table of Contents

🌟 AI Agents
🎯 Foundation models
🛠️ AI Tools
💾 Databases/Simulation
📊 Benchmarks
📚 Reviews
🔗 Other Awesome Projects

AI Agents

Year	Title	Team	Team Website	Affiliation	Domain	Venue	Paper/ Source	Code/Product
2025.07	PromptBio: A Multi-Agent AI Platform for Bioinformatics Data Analysis	PromptBio(Xiao Yang)	Link		multiple agents system for bioinformatics	bioRxiv	Link	Link
2025.07	GPTBioInsightor				Single Cell annotation			Link
2025.06	NVIDIA Biomedical AI-Q Research Agent Developer Blueprint	NVIDIA	Link		Drug Discovery (Target Identification)	Nvidia's blog	Link	Link
2025.06	OriGene: A Self-Evolving Virtual Disease Biologist Automating Therapeutic Target Discovery				General Biomedical Research (therapeutic targets)	bioRxiv	Link	Link
2025.06	Agent Laboratory: Using LLM Agents as Research Assistants				General Scientific Research	arXiv	Link	Link
2025.06	scExtract: leveraging large language models for fully automated single-cell RNA-seq data annotation and prior-informed multi-dataset integration				scRNA downstream tasks	Genome Biology	Link	Link
2025.06	CellVoyager: AI CompBio Agent Generates New Insights by Autonomously Analyzing Biological Data	James Zou	Link		single cell RNA	bioRxiv	Link	Link
2025.05	Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution	Mengdi Wang	Link		General Scientific Research(self-evolution and create MCP)	bioRxiv	Link	Link
2025.05	BioOmni: A General-Purpose AI Agent for Automated Biomedical Research	Marinka Zitnik	Link		General Biomedical Research (Multi-modal)	bioRxiv	Link	Link
2025.05	CellTypeAgent: Trustworthy cell type annotation with Large Language Models	Yunjian Li			LLM agent for cell type annotation in single-cell data	ArXiv	Link
2025.05	ChatMolData: A Multimodal Agent for Automatic Molecular Data Processing	Xiaohui Yu			Multimodal LLM-agent for automatic molecular data processing	Advanced Intelligent Systems	Link
2025.05	PlantGPT: An Arabidopsis-Based Intelligent Agent that Answers Questions about Plant Functional Genomics.	Qinlong Zhu			PlantGPT: LLM agent for plant functional genomics question answering	Advanced science	Link
2025.05	DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery	Wenbin Hu			LLM-based agent for parameterized reasoning in drug discovery	ArXiv	Link
2025.05	Automatic biomarker discovery and enrichment with BRAD	I. Rajapakse			LLM agent for automatic biomarker discovery and enrichment (BRAD)	Bioinformatics	Link
2025.04	The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search	Sakana AI(David Ha)	Link		General Scientific Research	ICLR 2025	Link	Link
2025.04	SpatialAgent: An autonomous AI agent for spatial biology	Genentech(Aviv Regev)	Link		Spatial scRNA	bioRxiv	Link	Link
2025.04	Knowledge Graph and Large Language Model for Metabolomics	Yuxing Lu			Knowledge Graphs and LLMs in Metabolomics		Link
2025.04	Spatial transcriptomics AI agent charts hPSC-pancreas maturation in vivo	Jia Liu		Harvard University	AI Agent for Spatial Transcriptomics Analysis	bioRxiv	Link
2025.04	scAgent: Universal Single-Cell Annotation via a LLM Agent	Yunjun Gao			Universal Single-Cell Annotation with LLM Agent	ArXiv	Link
2025.04	Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data				Single Cell annotation	bioRxiv	Link	Link
2025.04	SCassist: An AI Based Workflow Assistant for Single-Cell Analysis	Rachel R Caspi		National Institutes of Health	Single Cell tasks(cluster, annotation)	Bioinformatics	Link	Link
2025.04	Fleming: An AI Agent for Antibiotic Discovery in Mycobacterium tuberculosis	M. Farhat		Harvard Medical School	Tuberculosis antibiotic discovery AI agent	bioRxiv	Link
2025.04	OLAF: An Open Life Science Analysis Framework for Conversational Bioinformatics Powered by Large Language Models	Diego Gonzalez Lopez			Conversational bioinformatics pipeline	ArXiv	Link
2025.03	TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools	Marinka Zitnik	Link		Personalized Treatment Recommendations	arXiv	Link	Link
2025.03	DrBioRight 2.0: an LLM-powered bioinformatics chatbot for large-scale cancer functional proteomics analysis				proteomics	Nature Communication	Link	Link
2025.03	DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration				Drug Discovery	arXiv	Link	Link
2025.03	PharmAgents: Building a Virtual Pharma with Large Language Model Agents	Yanyan Lan			Drug Discovery	ArXiv	Link
2025.03	AstroAgents: A Multi-Agent AI for Hypothesis Generation from Mass Spectrometry Data	Amirali Aghazadeh			Astrobiology & Mass Spectrometry AI	ArXiv	Link
2025.03	DORA AI Scientist: Multi-agent Virtual Research Team for Scientific Exploration Discovery and Automated Report Generation	Alex Zhavoronkov		Insilico Medicine	Scientific Research Automation	bioRxiv	Link
2025.03	Design and Analysis of an Extreme-Scale, High-Performance, and Modular Agent-Based Simulation Platform	Lukas Breitwieser			Agent-Based Simulation	ArXiv	Link
2025.03	Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization	Haishuai Wang			Molecular Optimization	ArXiv	Link
2025.03	IAN: An Intelligent System for Omics Data Analysis and Discovery	Rachel R. Caspi		Laboratory of Immunology, National Eye Institute, NIH, Bethesda 20892, USA	Omics Data Integration	bioRxiv	Link	Link
2025.02	LIDDIA: Language-based Intelligent Drug Discovery Agent				Drug Discovery	arXiv	Link
2025.02	Towards an AI co-scientist				General Scientific Research	arXiv	Link
2025.02	Knowledge Synthesis of Photosynthesis Research Using a Large Language Model	Tae In Ahn			Photosynthesis research assistant using LLM	ArXiv	Link
2025.02	Spike sorting AI agent	Jia Liu		John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, USA	Spike sorting AI pipeline agent	bioRxiv	Link
2025.02	RAPID: Reliable and efficient Automatic generation of submission rePorting checklists with Large language moDels	Lu Zhang		Hong Kong Baptist University	Automated medical reporting checklist generation	bioRxiv	Link
2025.01	BioMaster: Multi-agent System for Automated Bioinformatics Analysis Workflow				Multi-omics Pipelines	bioRxiv	Link	Link
2025.01	InstructCell: A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following				single cell RNA	arXiv	Link	Link
2025.01	BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems	Microsoft Research(Venkat S. Malladi)	Link		Multi-omics Pipelines	arXiv	Link
2025.01	Extracting Knowledge from Scientific Texts on Patient-Derived Cancer Models Using Large Language Models: Algorithm Development and Validation	G. Savova		Boston Children's Hospital, Harvard Medical School	LLM-based entity extraction for patient-derived cancer models	bioRxiv	Link
2024.12	CASSIA: a multi-agent large language model for reference free, interpretable, and automated cell annotation of single-cell RNA-sequencing data				Single Cell annotation	bioRxiv	Link	Link
2024.12	ProtChat: An AI Multi-Agent for Automated Protein Analysis Leveraging GPT-4 and Protein Language Model	Yunpeng Cai			Automated protein analysis AI agent	Journal of chemical information and modeling	Link
2024.12	SCREADER: Prompting Large Language Models to Interpret scRNA-seq Data	Meng Xiao			scRNA-seq interpretation via LLM prompting	2024 IEEE International Conference on Data Mining Workshops (ICDMW)	Link
2024.12	SciAgents: Automating Scientific Discovery Through Bioinspired Multi‐Agent Intelligent Graph Reasoning	Markus J. Buehler			Multi-agent AI for automated materials discovery	Advanced Materials (Deerfield Beach, Fla.)	Link
2024.12	Development and Application of an In Vitro Drug Screening Assay for Schistosoma mansoni Schistosomula Using YOLOv5	Antonio Muro			AI-powered drug screening assay for schistosomula	Biomedicines	Link
2024.11	BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments				Experimental Design (Genetic Perturbation)	arXiv	Link	Link
2024.11	The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation	James Zou	Link		General Scientific Research	bioRxiv	Link	Link
2024.10	An AI Agent for Fully Automated Multi-Omic Analyses				Multi-omics Pipelines	Advanced Science	Link	Link
2024.09	The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery	Sakana AI(David Ha)	Link		General Scientific Research	arXiv	Link	Link
2024.07	CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis				single cell RNA	arXiv	Link	Link
2024.05	A Data-Intelligence-Intensive Bioinformatics Copilot System for Large-scale Omics Researches and Scientific Insights				Multi-omics/single cell RNA	bioRxiv	Link	Link
2024.04	CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments	Mengdi Wang, Le Cong	Link		Experimental Design (CRISPR Genetic Editing)	bioRxiv	Link	Link
2024.01	ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning				Protein design and discovery	arXiv	Link	Link

Foundation models

Year	Title	Team	Team Website	Affiliation	Domain	Venue	Paper/ Source	Code/Product
2025.07	ODFormer: a Virtual Organoid for Predicting Personalized Therapeutic Responses in Pancreatic Cancer				Organoid drug response prediction	bioRxiv	Link	Link
2025.07	Scalable emulation of protein equilibrium ensembles with generative deep learning	Microsoft Research(Frank Noé)	Link		Protein	Science	Link	Link
2025.07	Spatia: Multimodal Model for Prediction and Generation of Spatial Cell Phenotypes	Marinka Zitnik	Link		spatial single cell RNA	ArXiv	Link
2025.07	RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks				RNA	Nature Communication	Link	Link
2025.06	World Models as Simulators of Patient Biology: Oncology Counterfactual Therapeutics Oracle (OCTO)	Noetik	Link		spatial single cell RNA perturbation	noetik's report	Link
2025.06	Generalized biological foundation model with unified nucleic acid and protein language	Alibaba(Zhaorong Li)	Link		DNA/RNA/Protein	Nature Machine Intelligence	Link	Link
2025.06	Predicting cellular responses to perturbation across diverse contexts with State	Arc Institute(Yusuf H. Roohani)	Link		genetic, signaling, and chemical perturbation scRNA	bioRxiv	Link	Link
2025.06	AlphaGenome: Advancing regulatory variant effect prediction with a unified DNA sequence model	Google DeepMind(Pushmeet Kohli)	Link		DNA/RNA	bioRxiv	Link	Link
2025.06	SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model				DNA	ICML 2025	Link	Link
2025.06	UniCure: A Foundation Model for Predicting Personalized Cancer Therapy Response				drug perturbation scRNA	bioRxiv	Link	Link
2025.06	A multimodal conversational agent for DNA, RNA and protein tasks	InstaDeep(Thomas Pierrot)	Link		Diverse Omics include DNA/RNA/Protein	Nature Machine Intelligence	Link	Link
2025.05	A visual–omics foundation model to bridge histopathology with spatial transcriptomics	Guangyu Wang	Link		histopathology and spatial single cell RNA	Nature Methods	Link	Link
2025.05	GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance	Mengdi Wang	Link		Biosafety for DNA	arXiv	Link	Link
2025.05	BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model	Bo Wang	Link		DNA	arXiv	Link	Link
2025.05	CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells				single cell RNA	Nature Communications	Link	Link
2025.05	sciLaMA: A Single-Cell Representation Learning Framework to Leverage Prior Knowledge from Large Language Models	G. Quon		University of California, Davis	Single-cell representation learning leveraging LLM prior knowledge	bioRxiv	Link
2025.05	LlamaAffinity: A Predictive Antibody–Antigen Binding Model Integrating Antibody Sequences with Llama3 Backbone Architecture	J. Chen		University of Alabama at Birmingham	Antibody–antigen binding affinity prediction using LLM-based models	bioRxiv	Link
2025.05	GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype	Stan Z. Li			Graph representation learning for genetic perturbation integrating LLM and DNA features	ArXiv	Link
2025.04	Scaling Large Language Models for Next-Generation Single-Cell Analysis	David van Dijk	Link		single cell RNA	bioRxiv	Link	Link
2025.04	CellFlow enables generative single-cell phenotype modeling with flow matching	Fabian J. Theis	Link		Perturbation single cell RNA	bioRxiv	Link	Link
2025.04	Abstract 5084: Evaluation of single-cell foundation models for cancer outcome predictions	Eliezer M Van Allen			Single-cell Foundation Models for Cancer Outcome Prediction	Cancer Research	Link
2025.04	Abstract 6316: Predictive performance comparison of foundational and CNN models for single-cell immune profiling	Mai Chan Lau			Foundational vs. CNN Models for Immune Profiling in Histology	Cancer Research	Link
2025.04	Abstract 5059: Self-supervised representation learning of somatic mutational data	Etai Jacob			Self-supervised Representation Learning for Somatic Mutation Data	Cancer Research	Link
2025.04	dGeneralized Biological Foundation Model with Unified Nucleic Acid and Protein Language	Zhaorong Li		Alibaba Cloud Intelligence, Alibaba Group, Hangzhou, China	Unified DNA/RNA/Protein foundation model	bioRxiv	Link
2025.04	Towards multimodal foundation models in molecular cell biology.	Bo Wang			Multimodal molecular biology foundation models	Nature	Link
2025.04	Abstract 3762: DEL-AI: Proteome-wide in silico screening of multi-billion compound libraries using machine learning foundation models	Paul Novick			Proteome-wide in silico DEL screening	Cancer Research	Link
2025.03	Multimodal AI predicts clinical outcomes of drug combinations from preclinical data	Marinka Zitnik			Clinical Pharmacology Modeling	ArXiv	Link
2025.02	Genome modeling and design across all domains of life with Evo 2	Arc Institute(Brian L. Hie)	Link		DNA	bioRxiv	Link	Link
2025.02	GENERator: A Long-Context Generative Genomic Foundation Model				DNA	arXiv	Link	Link
2025.02	scGPT-spatial: Continual Pretraining of Single-Cell Foundation Model for Spatial Transcriptomics	Bo Wang	Link		spatial single cell RNA	bioRxiv	Link	Link
2025.02	Large Cognition Model: Towards Pretrained EEG Foundation Model	Aidan Hung-Wen Tsai			EEG foundation model	ArXiv	Link
2025.02	Omni-DNA: A Unified Genomic Foundation Model for Cross-Modal and Multi-Task Learning	Caihua Shan			Cross-modal genomic foundation model	ArXiv	Link
2025.02	GENERator: A Long-Context Generative Genomic Foundation Model	Zheng Wang			Generative genomic foundation model	ArXiv	Link
2025.02	Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction	S. Bozdag		University of North Texas	Drug–target interaction prediction framework	bioRxiv	Link
2025.02	AI-enabled alkaline-resistant evolution of protein to apply in mass production	Liang Hong			LLM-driven protein evolution for alkaline resistance	eLife	Link
2025.01	A foundation model of transcription across human cell types				transcriptional regulation(ATAC-seq)	Nature	Link	Link
2025.01	GENA-LM: a new DNA language model for long sequences				DNA	Nucleic Acids Research	Link	Link
2025.01	Function-Guided Conditional Generation Using Protein Language Models with Adapters	Salesforce Research( Ali Madani)	Link		Protein	arXiv	Link	Link
2025.01	Simulating 500 million years of evolution with a language model	EvolutionaryScale(Alexander Rives)	Link		Protein	Science	Link	Link
2025.01	Predicting cell morphological responses to perturbations using generative modeling	Fabian J. Theis, Mohammad Lotfollahi	Link		phenotype morphological responses to perturbation	Nature Communications	Link	Link
2025.01	Improving functional protein generation via foundation model-derived latent space likelihood optimization	César de la Fuente-Nunez		University of Pennsylvania	Generative protein design via PLM latent space optimization	bioRxiv	Link
2025.01	Unveiling the Evolution of Antimicrobial Peptides in Gut Microbes via Foundation Model-Powered Framework	Jinfang Zheng		Zhejiang Lab	Antimicrobial peptide discovery from gut microbes	bioRxiv	Link
2025.01	Knowledge Hierarchy Guided Biological-Medical Dataset Distillation for Domain LLM Training	Meng Xiao			Biomedical dataset distillation for LLM training	ArXiv	Link
2024.12	L2G: Repurposing Language Models for Genomics Tasks	Ameet Talwalkar		Carnegie Mellon University	Repurposing LLMs for genomics tasks	bioRxiv	Link
2024.12	Fine-Tuned Deep Transfer Learning Models for Large Screenings of Safer Drugs Targeting Class A GPCRs	M. Filizola		Icahn School of Medicine at Mount Sinai	Deep transfer learning for GPCR drug screening	bioRxiv	Link
2024.12	Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs)	G. Pollastri			Protein secondary structure prediction with PLMs	International Journal of Molecular Sciences	Link
2024.12	ProtDAT: A Unified Framework for Protein Sequence Design from Any Protein Text Description	Hong-Bin Shen			Multi-modal protein sequence design from text	ArXiv	Link
2024.11	Nucleotide Transformer: building and evaluating robust foundation models for human genomics				DNA	Nature Methods	Link	Link
2024.10	Orthrus: Towards Evolutionary and Functional RNA Foundation Models				RNA	bioRxiv	Link	Link
2024.09	stFormer: a foundation model for spatial transcriptomics				DNA	bioRxiv	Link	Link
2024.08	BioRAG: A RAG-LLM Framework for Biological Question Reasoning				Life science question answer (RAG)	arXiv	Link
2024.07	DNA language model GROVER learns sequence context in the human genome				DNA	Nature Machine Intelligence	Link	Link
2024.06	scFoundation: Large-scale foundation model on single-cell transcriptomics	Biomap research(Le Song, Xuegong Zhang)	Link		single cell RNA	Nature Communication	Link	Link
2024.06	Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models				biomedical question answer (RAG)	ISMB 24	Link	Link
2024.06	Multi-modal Transfer Learning between Biological Foundation Models	InstaDeep(Guillaume Richard, Thomas Pierrot)	Link		Diverse Omics include DNA/RNA/Protein	arXiv	Link	Link
2024.05	Accurate structure prediction of biomolecular interactions with AlphaFold 3	Google DeepMind(John M. Jumper)	Link		Protein	Nature	Link	Link
2024.04	Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology	Recursion(Berton Earnshaw)	Link		microscopy data	CVPR 2024 Highlight	Link	Link
2024.03	Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling				DNA	arXiv	Link	Link
2024.03	Nicheformer: a foundation model for single-cell and spatial omics	Fabian J. Theis	Link		single-cell and spatial omics	bioRxiv	Link	Link
2024.02	scGPT: toward building a foundation model for single-cell multi-omics using generative AI	Bo Wang	Link		single cell RNA	Nature Methods	Link	Link
2024.01	OmniNA: A foundation model for nucleotide sequences				DNA/RNA	bioRxiv	Link
2024.01	xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein	Biomap research(Le Song)	Link		Protein	arXiv	Link	Link
2024.01	Geneverse: A Collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research				DNA/Protein	EMNLP 2024	Link	Link
2023.11	HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution				DNA	NeurIPS 2023	Link	Link
2023.10	ProGen2: Exploring the boundaries of protein language models	Salesforce Research( Ali Madani)	Link		Protein	Cell Systems	Link	Link
2023.06	DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome				DNA	ICLR 2024	Link	Link
2023.05	Transfer learning enables predictions in network biology				General scRNA tasks, discovery of key network regulators and candidate therapeutic targets	Nature	Link	Link
2021.07	DNABERT: pre-trained Bidirectional Encoder Representations from Transformers for DNA-language				DNA	Bioinformatics	Link	Link

AI Tools

Year	Title	Domain	Venue	Paper/ Source	Code/Product
2025.07	GPTBioInsightor	Single Cell annotation			Link
2025.06	scExtract: leveraging large language models for fully automated single-cell RNA-seq data annotation and prior-informed multi-dataset integration	scRNA downstream tasks	Genome Biology	Link	Link
2025.04	Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data	Single Cell annotation	bioRxiv	Link	Link
2025.04	SCassist: An AI Based Workflow Assistant for Single-Cell Analysis	Single Cell tasks(cluster, annotation)	bioRxiv	Link	Link
2024.12	CASSIA: a multi-agent large language model for reference free, interpretable, and automated cell annotation of single-cell RNA-sequencing data	Single Cell annotation	bioRxiv	Link	Link

Databases/Simulation

Year	Title	Team	Team Website	Affiliation	Domain	Venue	Paper/ Source	Code/Product
2025.08	Drug Response Omics association MAp (DROMA, 卓玛)				Biomarker discovery, Drug prediction,(18 projects with 2,600+ samples and 56,000+ drugs, with full ecosystem)			Link
2025.06	X-Atlas/Orion: Genome-wide Perturb-seq Datasets via a Scalable Fix-Cryopreserve Platform for Training Dose-Dependent Biological Foundation Models				Genome-wide Perturb-seq Datasets	bioRxiv	Link	Link
2025.05	scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready	Yuanchun Zhou			Multi-species scRNA-seq database for AI-ready applications	Advanced Science	Link
2025.05	AlphaLasso—a web server to identify loop and lasso motifs in 3D structure of biopolymers	Joanna I. Sulkowska			Web server for lasso motifs in biopolymer 3D structures (AlphaLasso)	Nucleic Acids Research	Link
2025.04	scMultiSim: simulation of single-cell multi-omics and spatial data guided by gene regulatory networks and cell–cell interactions				Simulation of single-cell multi-omics and spatial data	Nature Methods	Link	Link
2025.04	uHAF: a unified hierarchical annotation framework for cell type standardization and harmonization	Xuegong Zhang			Hierarchical Framework for Cell Type Annotation and Harmonization	Bioinformatics	Link
2025.04	OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling	Fuhai Li			Text-omic signaling graph dataset	ArXiv	Link
2025.04	Abstract 1087: The evolving landscape of cancer transcriptomics data	Akpéli V. Nordor			Cancer transcriptomics data mapping	Cancer Research	Link
2025.03	RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy	Recursion(Imran S. Haque)	Link		microscopy perturbation data	ICLR 2025	Link	Link
2025.02	scBaseCount: an AI agent-curated, uniformly processed, and continually expanding single cell data repository	Arc Institute(Yusuf H. Roohani)	Link		preprocess scRNA data	bioRxiv	Link	Link
2025.02	Tahoe-100M: A Giga-Scale Single-Cell Perturbation Atlas for Context-Dependent Gene Function and Cellular Modeling	tahoebio(but in arc's repository)	Link		Drug perturbation scRNA	bioRxiv	Link	Link
2025.02	Literature-scaled immunological gene set annotation using AI-powered immune cell knowledge graph (ICKG)	Ken Chen		MD Anderson Cancer Center	Immune cell knowledge graph for gene set annotation	bioRxiv	Link
2024.12	M3-20M: A Large-Scale Multi-Modal Molecule Dataset for AI-driven Drug Design and Discovery	Shuigeng Zhou			Multi-modal molecule dataset for drug design	Journal of bioinformatics and computational biology	Link
2024.12	BioMedGraphica: An All-in-One Platform for Biomedical Prior Knowledge and Omic Signaling Graph Generation	Fuhai Li		Washington University in St. Louis	Biomedical knowledge graph platform	bioRxiv	Link
2024.12	Basic Science and Pathogenesis.	Li-San Wang			AI-enhanced search for Alzheimer's genomic database	Alzheimer's & dementia : the journal of the Alzheimer's Association	Link
2024.06	Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics	Marinka Zitnik	Link		Drug Discovery (Therapeutic science, 66 datasets)	bioRxiv	Link	Link
2024.04	Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations (Cell Painting Gallery)	Broad Institute(Anne E. Carpenter)	Link		phenotype morphological responses to perturbation	Nature Methods	Link	Link
2024.01	scPerturb: harmonized single-cell perturbation data				scRNA perturbation(44 public datasets, drugs and genes)	Nature Methods	Link	Link

Benchmarks

Year	Title	Team	Team Website	Affiliation	Domain	Venue	Paper/ Source	Code/Product
2025.06	Fundamental Limitations of Foundation Models in Single-Cell Transcriptomics				evaluate cell-type classification	bioRxiv	Link
2025.06	Cell-Eval(Predicting cellular responses to perturbation across diverse contexts with State)	Arc Institute(Yusuf H. Roohani)	Link		Perturbation scRNA	bioRxiv	Link	Link
2025.05	The influence of prompt engineering on large language models for protein–protein interaction identification in biomedical literature	Yi-Hsuan Lin			Prompt engineering of LLMs for protein-protein interaction extraction	Scientific Reports	Link
2025.05	CellVerse: Do Large Language Models Really Understand Cell Biology?	P. Heng			LLMs for language-driven single-cell multi-omics analysis (CellVerse)	ArXiv	Link
2025.05	Large Language Models Can Extract Metadata for Annotation of Human Neuroimaging Publications	Jessica A Turner		Ohio State University Wexner Medical Center, Columbus, OH, USA	LLM extraction and annotation of neuroimaging metadata	bioRxiv	Link
2025.05	scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction	Qianqian Song			Benchmarking foundation models for single-cell drug response prediction	ArXiv	Link
2025.04	Zero-shot evaluation reveals limitations of single-cell foundation models	Microsoft Research(Alex X. Lu)	Link		cell type clustering and batch integration for scGPT and Geneformer	Genome Biology	Link	Link
2025.04	OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling	Fuhai Li			Cell Text-Omic Signaling Graph Benchmark	ArXiv	Link
2025.04	Abstract 6316: Predictive performance comparison of foundational and CNN models for single-cell immune profiling	Mai Chan Lau			Single-cell immune profiling benchmark	Cancer Research	Link
2025.03	Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment	Zhenguang G. Cai			Cognitive Neuroscience & Audio-Language Modeling	ArXiv	Link
2025.02	BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning	Zhi-Hong Deng			Biological pathway reasoning benchmark	ArXiv	Link
2025.02	The Skin Game: Revolutionizing Standards for AI Dermatology Model Comparison	Dariusz Jemielniak			Dermatology image classification evaluation framework	ArXiv	Link
2025.02	Consequences of training data composition for deep learning models in single-cell biology	Lorin Crawford		Harvard Medical School	Training data composition effects in single-cell models	bioRxiv	Link
2025.01	CARDBiomedBench: A Benchmark for Evaluating Large Language Model Performance in Biomedical Research	F. Faghri		Center for Alzheimer's and Related Dementias, National Institute on Aging, National Institutes of Health, Bethesda, MD, 20892, USA; DataTecnica, Washington,	LLM evaluation in neurodegenerative disease research	bioRxiv	Link
2025.01	Large Language Models Think Too Fast To Explore Effectively	Robert C. Wilson			Exploration capabilities of LLMs in open-ended tasks	ArXiv	Link
2025.01	Sequence Modeling Is Not Evolutionary Reasoning	M. Zitnik		Harvard Medical School	Evolutionary reasoning benchmark for protein LLMs	bioRxiv	Link
2024.12	PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis				Perturbation scRNA	NeurIPS 2024	Link	Link
2024.12	Does your model understand genes? A benchmark of gene properties for biological and text models	Y. Shimoni			Benchmark of gene property prediction models	ArXiv	Link

Reviews

Year	Title	Team	Affiliation	Domain	Venue	Paper/ Source	Code/Product
2025.07	Human interpretable grammar encodes multicellular systems biology models to democratize virtual cell laboratories	Paul Macklin	Indiana University	Virtual cell	Cell	Link	Link
2025.07	The generative era of medical AI	Pranav Rajpurkar	Harvard Medical School	Medical AI	Cell	Link
2025.05	Empowering Biomedical Research with Foundation Models in Computational Microscopy: A Systematic Review	Rong Luo		Foundation models in computational microscopy	Advanced Intelligent Systems	Link
2025.04	Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions			MCP for agents	arXiv	Link
2025.04	Towards multimodal foundation models in molecular cell biology.	Bo Wang		Multimodal Foundation Models in Molecular Cell Biology	Nature	Link
2025.04	A Survey of the Model Context Protocol (MCP): Standardizing Context to Enhance Large Language Models (LLMs)			MCP for agents	Preprints	Link
2025.04	Decoding cancer prognosis with deep learning: the ASD-cancer framework for tumor microenvironment analysis	J. Haran		Deep Learning for Tumor Microenvironment Prognosis	mSystems	Link
2025.03	Large language model for knowledge synthesis and AI-enhanced biomanufacturing.	Yinjie J. Tang		Biomanufacturing	Trends in biotechnology	Link
2025.03	Biological Sequence with Language Model Prompting: A Survey	Yu Li		Sequence Analysis	ArXiv	Link
2025.03	AI-Empowered Genome Decoding: Applications of Large Language Models in Genomics	Yu Zhou		Genomics	Frontiers of Digital Education	Link
2025.03	Large Language Models in Bioinformatics: A Survey	Yu Li		Bioinformatics	ArXiv	Link
2025.03	A Conceptual Framework for Human-AI Collaborative Genome Annotation	Radoslaw Suchecki		Genome Annotation	ArXiv	Link
2025.02	Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective	Yu Takagi		LLM learning dynamics from a neuroscience perspective	ArXiv	Link
2025.02	Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents	Mariya Toneva		Episodic memory framework for long-term LLM agents	ArXiv	Link
2025.02	Microbial Ecology to Ocean Carbon Cycling: From Genomes to Numerical Models	E. Zakem		Integration of microbial ecology and numerical models for ocean carbon cycling	Annual Review of Earth and Planetary Sciences	Link
2025.01	Survey and Improvement Strategies for Gene Prioritization with Large Language Models	Xia Hu		LLM-based gene prioritization	ArXiv	Link
2025.01	Computational Protein Science in the Era of Large Language Models (LLMs)	Qing Li		Computational protein science with LLMs	ArXiv	Link
2025.01	Foundation models in bioinformatics	Jianxin Wang		Bioinformatics foundation models overview	National Science Review	Link
2025.01	AI Methods for Antimicrobial Peptides: Progress and Challenges	César de la Fuente-Nunez		AI methods for antimicrobial peptide design	Microbial Biotechnology	Link
2025.01	Artificial intelligence driven innovations in biochemistry: A review of emerging research frontiers	M. A. Lateef Junaid		AI innovations in biochemistry	Biomolecules and Biomedicine	Link
2025.01	Artificial Intelligence Tools Addressing Challenges of Cancer Progression Due to Antimicrobial Resistance in Pathogenic Biofilm Systems	Abhijit G. Banerjee		AI tools for antimicrobial resistance in cancer biofilms	Artificial Intelligence Evolution	Link
2025.01	Learning the language of life with AI.	E. Topol		Multiomic foundation models for biomolecule prediction	Science	Link
2024.12	Large language models facilitating modern molecular biology and novel drug development	Fei Liu		Review of LLMs in molecular biology and drug development	Frontiers in Pharmacology	Link
2024.12	From multi-omics to predictive biomarker: AI in tumor microenvironment	Yingli Sun		Review of AI in tumor microenvironment multi-omics	Frontiers in Immunology	Link
2024.12	Advancements and Applications of Protein Structure Prediction Algorithms	Ye Chen		Review of protein structure prediction methods	Theoretical and Natural Science	Link
2024.10	Empowering biomedical discovery with AI agents			Biomedical Agents	Cell	Link
2024.08	Transformers in single-cell omics: a review and new perspectives			Single cell foundation models	Nature Methods	Link
2024.08	Language models for biological research: a primer			biological foundation models	Nature Methods	Link
2024.07	Foundation models for bioinformatics			Bioinformatics foundation models	Quantitative Biology	Link

Other Awesome Projects

Title	Project
Awesome-LLMs-meet-genomes	Link
Awesome-Virtual-Cell	Link
Awesome-Phenotypic-Drug-Discovery	Link
Awesome-LLM-Scientific-Discovery	Link
Awesome bioagent papers	Link
Awesome LLM Agents for Scientific Discovery	Link
Awesome Papers on Agents for Science	Link
awesome-single-cell	Link
Awesome-Bioinformatics	Link
awesome	Link