Visit the official website at: https://webioinfo01.github.io/Awesome-AI-Meets-Biology/
AI separated into "Foundation models" and "AI Agents" for biology/biomedical/bioinformatics research.
- 🌟 AI Agents
- 🎯 Foundation models
- 🛠️ AI Tools
- 💾 Databases/Simulation
- 📊 Benchmarks
- 📚 Reviews
- 🔗 Other Awesome Projects
Year | Title | Team | Team Website | Affiliation | Domain | Venue | Paper/ Source | Code/Product |
---|---|---|---|---|---|---|---|---|
2025.07 | PromptBio: A Multi-Agent AI Platform for Bioinformatics Data Analysis | PromptBio(Xiao Yang) | Link | multiple agents system for bioinformatics | bioRxiv | Link | Link | |
2025.07 | GPTBioInsightor | Single Cell annotation | Link |
|||||
2025.06 | NVIDIA Biomedical AI-Q Research Agent Developer Blueprint | NVIDIA | Link | Drug Discovery (Target Identification) | Nvidia's blog | Link | Link |
|
2025.06 | OriGene: A Self-Evolving Virtual Disease Biologist Automating Therapeutic Target Discovery | General Biomedical Research (therapeutic targets) | bioRxiv | Link | Link |
|||
2025.06 | Agent Laboratory: Using LLM Agents as Research Assistants | General Scientific Research | arXiv | Link | Link |
|||
2025.06 | scExtract: leveraging large language models for fully automated single-cell RNA-seq data annotation and prior-informed multi-dataset integration | scRNA downstream tasks | Genome Biology | Link | Link |
|||
2025.06 | CellVoyager: AI CompBio Agent Generates New Insights by Autonomously Analyzing Biological Data | James Zou | Link | single cell RNA | bioRxiv | Link | Link |
|
2025.05 | Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution | Mengdi Wang | Link | General Scientific Research(self-evolution and create MCP) | bioRxiv | Link | Link |
|
2025.05 | BioOmni: A General-Purpose AI Agent for Automated Biomedical Research | Marinka Zitnik | Link | General Biomedical Research (Multi-modal) | bioRxiv | Link | Link |
|
2025.05 | CellTypeAgent: Trustworthy cell type annotation with Large Language Models | Yunjian Li | LLM agent for cell type annotation in single-cell data | ArXiv | Link | |||
2025.05 | ChatMolData: A Multimodal Agent for Automatic Molecular Data Processing | Xiaohui Yu | Multimodal LLM-agent for automatic molecular data processing | Advanced Intelligent Systems | Link | |||
2025.05 | PlantGPT: An Arabidopsis-Based Intelligent Agent that Answers Questions about Plant Functional Genomics. | Qinlong Zhu | PlantGPT: LLM agent for plant functional genomics question answering | Advanced science | Link | |||
2025.05 | DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery | Wenbin Hu | LLM-based agent for parameterized reasoning in drug discovery | ArXiv | Link | |||
2025.05 | Automatic biomarker discovery and enrichment with BRAD | I. Rajapakse | LLM agent for automatic biomarker discovery and enrichment (BRAD) | Bioinformatics | Link | |||
2025.04 | The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search | Sakana AI(David Ha) | Link | General Scientific Research | ICLR 2025 | Link | Link |
|
2025.04 | SpatialAgent: An autonomous AI agent for spatial biology | Genentech(Aviv Regev) | Link | Spatial scRNA | bioRxiv | Link | Link |
|
2025.04 | Knowledge Graph and Large Language Model for Metabolomics | Yuxing Lu | Knowledge Graphs and LLMs in Metabolomics | Link | ||||
2025.04 | Spatial transcriptomics AI agent charts hPSC-pancreas maturation in vivo | Jia Liu | Harvard University | AI Agent for Spatial Transcriptomics Analysis | bioRxiv | Link | ||
2025.04 | scAgent: Universal Single-Cell Annotation via a LLM Agent | Yunjun Gao | Universal Single-Cell Annotation with LLM Agent | ArXiv | Link | |||
2025.04 | Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data | Single Cell annotation | bioRxiv | Link | Link |
|||
2025.04 | SCassist: An AI Based Workflow Assistant for Single-Cell Analysis | Rachel R Caspi | National Institutes of Health | Single Cell tasks(cluster, annotation) | Bioinformatics | Link | Link |
|
2025.04 | Fleming: An AI Agent for Antibiotic Discovery in Mycobacterium tuberculosis | M. Farhat | Harvard Medical School | Tuberculosis antibiotic discovery AI agent | bioRxiv | Link | ||
2025.04 | OLAF: An Open Life Science Analysis Framework for Conversational Bioinformatics Powered by Large Language Models | Diego Gonzalez Lopez | Conversational bioinformatics pipeline | ArXiv | Link | |||
2025.03 | TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools | Marinka Zitnik | Link | Personalized Treatment Recommendations | arXiv | Link | Link |
|
2025.03 | DrBioRight 2.0: an LLM-powered bioinformatics chatbot for large-scale cancer functional proteomics analysis | proteomics | Nature Communication | Link | Link | |||
2025.03 | DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration | Drug Discovery | arXiv | Link | Link | |||
2025.03 | PharmAgents: Building a Virtual Pharma with Large Language Model Agents | Yanyan Lan | Drug Discovery | ArXiv | Link | |||
2025.03 | AstroAgents: A Multi-Agent AI for Hypothesis Generation from Mass Spectrometry Data | Amirali Aghazadeh | Astrobiology & Mass Spectrometry AI | ArXiv | Link | |||
2025.03 | DORA AI Scientist: Multi-agent Virtual Research Team for Scientific Exploration Discovery and Automated Report Generation | Alex Zhavoronkov | Insilico Medicine | Scientific Research Automation | bioRxiv | Link | ||
2025.03 | Design and Analysis of an Extreme-Scale, High-Performance, and Modular Agent-Based Simulation Platform | Lukas Breitwieser | Agent-Based Simulation | ArXiv | Link | |||
2025.03 | Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization | Haishuai Wang | Molecular Optimization | ArXiv | Link | |||
2025.03 | IAN: An Intelligent System for Omics Data Analysis and Discovery | Rachel R. Caspi | Laboratory of Immunology, National Eye Institute, NIH, Bethesda 20892, USA | Omics Data Integration | bioRxiv | Link | Link |
|
2025.02 | LIDDIA: Language-based Intelligent Drug Discovery Agent | Drug Discovery | arXiv | Link | ||||
2025.02 | Towards an AI co-scientist | General Scientific Research | arXiv | Link | ||||
2025.02 | Knowledge Synthesis of Photosynthesis Research Using a Large Language Model | Tae In Ahn | Photosynthesis research assistant using LLM | ArXiv | Link | |||
2025.02 | Spike sorting AI agent | Jia Liu | John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA, USA | Spike sorting AI pipeline agent | bioRxiv | Link | ||
2025.02 | RAPID: Reliable and efficient Automatic generation of submission rePorting checklists with Large language moDels | Lu Zhang | Hong Kong Baptist University | Automated medical reporting checklist generation | bioRxiv | Link | ||
2025.01 | BioMaster: Multi-agent System for Automated Bioinformatics Analysis Workflow | Multi-omics Pipelines | bioRxiv | Link | Link |
|||
2025.01 | InstructCell: A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following | single cell RNA | arXiv | Link | Link |
|||
2025.01 | BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems | Microsoft Research(Venkat S. Malladi) | Link | Multi-omics Pipelines | arXiv | Link | ||
2025.01 | Extracting Knowledge from Scientific Texts on Patient-Derived Cancer Models Using Large Language Models: Algorithm Development and Validation | G. Savova | Boston Children's Hospital, Harvard Medical School | LLM-based entity extraction for patient-derived cancer models | bioRxiv | Link | ||
2024.12 | CASSIA: a multi-agent large language model for reference free, interpretable, and automated cell annotation of single-cell RNA-sequencing data | Single Cell annotation | bioRxiv | Link | Link |
|||
2024.12 | ProtChat: An AI Multi-Agent for Automated Protein Analysis Leveraging GPT-4 and Protein Language Model | Yunpeng Cai | Automated protein analysis AI agent | Journal of chemical information and modeling | Link | |||
2024.12 | SCREADER: Prompting Large Language Models to Interpret scRNA-seq Data | Meng Xiao | scRNA-seq interpretation via LLM prompting | 2024 IEEE International Conference on Data Mining Workshops (ICDMW) | Link | |||
2024.12 | SciAgents: Automating Scientific Discovery Through Bioinspired Multi‐Agent Intelligent Graph Reasoning | Markus J. Buehler | Multi-agent AI for automated materials discovery | Advanced Materials (Deerfield Beach, Fla.) | Link | |||
2024.12 | Development and Application of an In Vitro Drug Screening Assay for Schistosoma mansoni Schistosomula Using YOLOv5 | Antonio Muro | AI-powered drug screening assay for schistosomula | Biomedicines | Link | |||
2024.11 | BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments | Experimental Design (Genetic Perturbation) | arXiv | Link | Link |
|||
2024.11 | The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation | James Zou | Link | General Scientific Research | bioRxiv | Link | Link |
|
2024.10 | An AI Agent for Fully Automated Multi-Omic Analyses | Multi-omics Pipelines | Advanced Science | Link | Link |
|||
2024.09 | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | Sakana AI(David Ha) | Link | General Scientific Research | arXiv | Link | Link |
|
2024.07 | CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis | single cell RNA | arXiv | Link | Link | |||
2024.05 | A Data-Intelligence-Intensive Bioinformatics Copilot System for Large-scale Omics Researches and Scientific Insights | Multi-omics/single cell RNA | bioRxiv | Link | Link |
|||
2024.04 | CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments | Mengdi Wang, Le Cong | Link | Experimental Design (CRISPR Genetic Editing) | bioRxiv | Link | Link |
|
2024.01 | ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning | Protein design and discovery | arXiv | Link | Link |
Year | Title | Team | Team Website | Affiliation | Domain | Venue | Paper/ Source | Code/Product |
---|---|---|---|---|---|---|---|---|
2025.07 | ODFormer: a Virtual Organoid for Predicting Personalized Therapeutic Responses in Pancreatic Cancer | Organoid drug response prediction | bioRxiv | Link | Link |
|||
2025.07 | Scalable emulation of protein equilibrium ensembles with generative deep learning | Microsoft Research(Frank Noé) | Link | Protein | Science | Link | Link |
|
2025.07 | Spatia: Multimodal Model for Prediction and Generation of Spatial Cell Phenotypes | Marinka Zitnik | Link | spatial single cell RNA | ArXiv | Link | ||
2025.07 | RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks | RNA | Nature Communication | Link | Link |
|||
2025.06 | World Models as Simulators of Patient Biology: Oncology Counterfactual Therapeutics Oracle (OCTO) | Noetik | Link | spatial single cell RNA perturbation | noetik's report | Link | ||
2025.06 | Generalized biological foundation model with unified nucleic acid and protein language | Alibaba(Zhaorong Li) | Link | DNA/RNA/Protein | Nature Machine Intelligence | Link | Link |
|
2025.06 | Predicting cellular responses to perturbation across diverse contexts with State | Arc Institute(Yusuf H. Roohani) | Link | genetic, signaling, and chemical perturbation scRNA | bioRxiv | Link | Link |
|
2025.06 | AlphaGenome: Advancing regulatory variant effect prediction with a unified DNA sequence model | Google DeepMind(Pushmeet Kohli) | Link | DNA/RNA | bioRxiv | Link | Link |
|
2025.06 | SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model | DNA | ICML 2025 | Link | Link |
|||
2025.06 | UniCure: A Foundation Model for Predicting Personalized Cancer Therapy Response | drug perturbation scRNA | bioRxiv | Link | Link |
|||
2025.06 | A multimodal conversational agent for DNA, RNA and protein tasks | InstaDeep(Thomas Pierrot) | Link | Diverse Omics include DNA/RNA/Protein | Nature Machine Intelligence | Link | Link | |
2025.05 | A visual–omics foundation model to bridge histopathology with spatial transcriptomics | Guangyu Wang | Link | histopathology and spatial single cell RNA | Nature Methods | Link | Link |
|
2025.05 | GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance | Mengdi Wang | Link | Biosafety for DNA | arXiv | Link | Link |
|
2025.05 | BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | Bo Wang | Link | DNA | arXiv | Link | Link |
|
2025.05 | CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells | single cell RNA | Nature Communications | Link | Link |
|||
2025.05 | sciLaMA: A Single-Cell Representation Learning Framework to Leverage Prior Knowledge from Large Language Models | G. Quon | University of California, Davis | Single-cell representation learning leveraging LLM prior knowledge | bioRxiv | Link | ||
2025.05 | LlamaAffinity: A Predictive Antibody–Antigen Binding Model Integrating Antibody Sequences with Llama3 Backbone Architecture | J. Chen | University of Alabama at Birmingham | Antibody–antigen binding affinity prediction using LLM-based models | bioRxiv | Link | ||
2025.05 | GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype | Stan Z. Li | Graph representation learning for genetic perturbation integrating LLM and DNA features | ArXiv | Link | |||
2025.04 | Scaling Large Language Models for Next-Generation Single-Cell Analysis | David van Dijk | Link | single cell RNA | bioRxiv | Link | Link |
|
2025.04 | CellFlow enables generative single-cell phenotype modeling with flow matching | Fabian J. Theis | Link | Perturbation single cell RNA | bioRxiv | Link | Link |
|
2025.04 | Abstract 5084: Evaluation of single-cell foundation models for cancer outcome predictions | Eliezer M Van Allen | Single-cell Foundation Models for Cancer Outcome Prediction | Cancer Research | Link | |||
2025.04 | Abstract 6316: Predictive performance comparison of foundational and CNN models for single-cell immune profiling | Mai Chan Lau | Foundational vs. CNN Models for Immune Profiling in Histology | Cancer Research | Link | |||
2025.04 | Abstract 5059: Self-supervised representation learning of somatic mutational data | Etai Jacob | Self-supervised Representation Learning for Somatic Mutation Data | Cancer Research | Link | |||
2025.04 | dGeneralized Biological Foundation Model with Unified Nucleic Acid and Protein Language | Zhaorong Li | Alibaba Cloud Intelligence, Alibaba Group, Hangzhou, China | Unified DNA/RNA/Protein foundation model | bioRxiv | Link | ||
2025.04 | Towards multimodal foundation models in molecular cell biology. | Bo Wang | Multimodal molecular biology foundation models | Nature | Link | |||
2025.04 | Abstract 3762: DEL-AI: Proteome-wide in silico screening of multi-billion compound libraries using machine learning foundation models | Paul Novick | Proteome-wide in silico DEL screening | Cancer Research | Link | |||
2025.03 | Multimodal AI predicts clinical outcomes of drug combinations from preclinical data | Marinka Zitnik | Clinical Pharmacology Modeling | ArXiv | Link | |||
2025.02 | Genome modeling and design across all domains of life with Evo 2 | Arc Institute(Brian L. Hie) | Link | DNA | bioRxiv | Link | Link |
|
2025.02 | GENERator: A Long-Context Generative Genomic Foundation Model | DNA | arXiv | Link | Link |
|||
2025.02 | scGPT-spatial: Continual Pretraining of Single-Cell Foundation Model for Spatial Transcriptomics | Bo Wang | Link | spatial single cell RNA | bioRxiv | Link | Link |
|
2025.02 | Large Cognition Model: Towards Pretrained EEG Foundation Model | Aidan Hung-Wen Tsai | EEG foundation model | ArXiv | Link | |||
2025.02 | Omni-DNA: A Unified Genomic Foundation Model for Cross-Modal and Multi-Task Learning | Caihua Shan | Cross-modal genomic foundation model | ArXiv | Link | |||
2025.02 | GENERator: A Long-Context Generative Genomic Foundation Model | Zheng Wang | Generative genomic foundation model | ArXiv | Link | |||
2025.02 | Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction | S. Bozdag | University of North Texas | Drug–target interaction prediction framework | bioRxiv | Link | ||
2025.02 | AI-enabled alkaline-resistant evolution of protein to apply in mass production | Liang Hong | LLM-driven protein evolution for alkaline resistance | eLife | Link | |||
2025.01 | A foundation model of transcription across human cell types | transcriptional regulation(ATAC-seq) | Nature | Link | Link |
|||
2025.01 | GENA-LM: a new DNA language model for long sequences | DNA | Nucleic Acids Research | Link | Link |
|||
2025.01 | Function-Guided Conditional Generation Using Protein Language Models with Adapters | Salesforce Research( Ali Madani) | Link | Protein | arXiv | Link | Link |
|
2025.01 | Simulating 500 million years of evolution with a language model | EvolutionaryScale(Alexander Rives) | Link | Protein | Science | Link | Link |
|
2025.01 | Predicting cell morphological responses to perturbations using generative modeling | Fabian J. Theis, Mohammad Lotfollahi | Link | phenotype morphological responses to perturbation | Nature Communications | Link | Link |
|
2025.01 | Improving functional protein generation via foundation model-derived latent space likelihood optimization | César de la Fuente-Nunez | University of Pennsylvania | Generative protein design via PLM latent space optimization | bioRxiv | Link | ||
2025.01 | Unveiling the Evolution of Antimicrobial Peptides in Gut Microbes via Foundation Model-Powered Framework | Jinfang Zheng | Zhejiang Lab | Antimicrobial peptide discovery from gut microbes | bioRxiv | Link | ||
2025.01 | Knowledge Hierarchy Guided Biological-Medical Dataset Distillation for Domain LLM Training | Meng Xiao | Biomedical dataset distillation for LLM training | ArXiv | Link | |||
2024.12 | L2G: Repurposing Language Models for Genomics Tasks | Ameet Talwalkar | Carnegie Mellon University | Repurposing LLMs for genomics tasks | bioRxiv | Link | ||
2024.12 | Fine-Tuned Deep Transfer Learning Models for Large Screenings of Safer Drugs Targeting Class A GPCRs | M. Filizola | Icahn School of Medicine at Mount Sinai | Deep transfer learning for GPCR drug screening | bioRxiv | Link | ||
2024.12 | Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs) | G. Pollastri | Protein secondary structure prediction with PLMs | International Journal of Molecular Sciences | Link | |||
2024.12 | ProtDAT: A Unified Framework for Protein Sequence Design from Any Protein Text Description | Hong-Bin Shen | Multi-modal protein sequence design from text | ArXiv | Link | |||
2024.11 | Nucleotide Transformer: building and evaluating robust foundation models for human genomics | DNA | Nature Methods | Link | Link |
|||
2024.10 | Orthrus: Towards Evolutionary and Functional RNA Foundation Models | RNA | bioRxiv | Link | Link |
|||
2024.09 | stFormer: a foundation model for spatial transcriptomics | DNA | bioRxiv | Link | Link |
|||
2024.08 | BioRAG: A RAG-LLM Framework for Biological Question Reasoning | Life science question answer (RAG) | arXiv | Link | ||||
2024.07 | DNA language model GROVER learns sequence context in the human genome | DNA | Nature Machine Intelligence | Link | Link | |||
2024.06 | scFoundation: Large-scale foundation model on single-cell transcriptomics | Biomap research(Le Song, Xuegong Zhang) | Link | single cell RNA | Nature Communication | Link | Link |
|
2024.06 | Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models | biomedical question answer (RAG) | ISMB 24 | Link | Link |
|||
2024.06 | Multi-modal Transfer Learning between Biological Foundation Models | InstaDeep(Guillaume Richard, Thomas Pierrot) | Link | Diverse Omics include DNA/RNA/Protein | arXiv | Link | Link | |
2024.05 | Accurate structure prediction of biomolecular interactions with AlphaFold 3 | Google DeepMind(John M. Jumper) | Link | Protein | Nature | Link | Link |
|
2024.04 | Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology | Recursion(Berton Earnshaw) | Link | microscopy data | CVPR 2024 Highlight | Link | Link | |
2024.03 | Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling | DNA | arXiv | Link | Link |
|||
2024.03 | Nicheformer: a foundation model for single-cell and spatial omics | Fabian J. Theis | Link | single-cell and spatial omics | bioRxiv | Link | Link |
|
2024.02 | scGPT: toward building a foundation model for single-cell multi-omics using generative AI | Bo Wang | Link | single cell RNA | Nature Methods | Link | Link |
|
2024.01 | OmniNA: A foundation model for nucleotide sequences | DNA/RNA | bioRxiv | Link | ||||
2024.01 | xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein | Biomap research(Le Song) | Link | Protein | arXiv | Link | Link |
|
2024.01 | Geneverse: A Collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research | DNA/Protein | EMNLP 2024 | Link | Link |
|||
2023.11 | HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution | DNA | NeurIPS 2023 | Link | Link |
|||
2023.10 | ProGen2: Exploring the boundaries of protein language models | Salesforce Research( Ali Madani) | Link | Protein | Cell Systems | Link | Link |
|
2023.06 | DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome | DNA | ICLR 2024 | Link | Link |
|||
2023.05 | Transfer learning enables predictions in network biology | General scRNA tasks, discovery of key network regulators and candidate therapeutic targets | Nature | Link | Link | |||
2021.07 | DNABERT: pre-trained Bidirectional Encoder Representations from Transformers for DNA-language | DNA | Bioinformatics | Link | Link |
Year | Title | Team | Team Website | Affiliation | Domain | Venue | Paper/ Source | Code/Product |
---|---|---|---|---|---|---|---|---|
2025.07 | GPTBioInsightor | Single Cell annotation | Link |
|||||
2025.06 | scExtract: leveraging large language models for fully automated single-cell RNA-seq data annotation and prior-informed multi-dataset integration | scRNA downstream tasks | Genome Biology | Link | Link |
|||
2025.04 | Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data | Single Cell annotation | bioRxiv | Link | Link |
|||
2025.04 | SCassist: An AI Based Workflow Assistant for Single-Cell Analysis | Single Cell tasks(cluster, annotation) | bioRxiv | Link | Link |
|||
2024.12 | CASSIA: a multi-agent large language model for reference free, interpretable, and automated cell annotation of single-cell RNA-sequencing data | Single Cell annotation | bioRxiv | Link | Link |
Year | Title | Team | Team Website | Affiliation | Domain | Venue | Paper/ Source | Code/Product |
---|---|---|---|---|---|---|---|---|
2025.08 | Drug Response Omics association MAp (DROMA, 卓玛) | Biomarker discovery, Drug prediction,(18 projects with 2,600+ samples and 56,000+ drugs, with full ecosystem) | Link |
|||||
2025.06 | X-Atlas/Orion: Genome-wide Perturb-seq Datasets via a Scalable Fix-Cryopreserve Platform for Training Dose-Dependent Biological Foundation Models | Genome-wide Perturb-seq Datasets | bioRxiv | Link | Link | |||
2025.05 | scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready | Yuanchun Zhou | Multi-species scRNA-seq database for AI-ready applications | Advanced Science | Link | |||
2025.05 | AlphaLasso—a web server to identify loop and lasso motifs in 3D structure of biopolymers | Joanna I. Sulkowska | Web server for lasso motifs in biopolymer 3D structures (AlphaLasso) | Nucleic Acids Research | Link | |||
2025.04 | scMultiSim: simulation of single-cell multi-omics and spatial data guided by gene regulatory networks and cell–cell interactions | Simulation of single-cell multi-omics and spatial data | Nature Methods | Link | Link |
|||
2025.04 | uHAF: a unified hierarchical annotation framework for cell type standardization and harmonization | Xuegong Zhang | Hierarchical Framework for Cell Type Annotation and Harmonization | Bioinformatics | Link | |||
2025.04 | OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling | Fuhai Li | Text-omic signaling graph dataset | ArXiv | Link | |||
2025.04 | Abstract 1087: The evolving landscape of cancer transcriptomics data | Akpéli V. Nordor | Cancer transcriptomics data mapping | Cancer Research | Link | |||
2025.03 | RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy | Recursion(Imran S. Haque) | Link | microscopy perturbation data | ICLR 2025 | Link | Link | |
2025.02 | scBaseCount: an AI agent-curated, uniformly processed, and continually expanding single cell data repository | Arc Institute(Yusuf H. Roohani) | Link | preprocess scRNA data | bioRxiv | Link | Link |
|
2025.02 | Tahoe-100M: A Giga-Scale Single-Cell Perturbation Atlas for Context-Dependent Gene Function and Cellular Modeling | tahoebio(but in arc's repository) | Link | Drug perturbation scRNA | bioRxiv | Link | Link |
|
2025.02 | Literature-scaled immunological gene set annotation using AI-powered immune cell knowledge graph (ICKG) | Ken Chen | MD Anderson Cancer Center | Immune cell knowledge graph for gene set annotation | bioRxiv | Link | ||
2024.12 | M3-20M: A Large-Scale Multi-Modal Molecule Dataset for AI-driven Drug Design and Discovery | Shuigeng Zhou | Multi-modal molecule dataset for drug design | Journal of bioinformatics and computational biology | Link | |||
2024.12 | BioMedGraphica: An All-in-One Platform for Biomedical Prior Knowledge and Omic Signaling Graph Generation | Fuhai Li | Washington University in St. Louis | Biomedical knowledge graph platform | bioRxiv | Link | ||
2024.12 | Basic Science and Pathogenesis. | Li-San Wang | AI-enhanced search for Alzheimer's genomic database | Alzheimer's & dementia : the journal of the Alzheimer's Association | Link | |||
2024.06 | Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics | Marinka Zitnik | Link | Drug Discovery (Therapeutic science, 66 datasets) | bioRxiv | Link | Link |
|
2024.04 | Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations (Cell Painting Gallery) | Broad Institute(Anne E. Carpenter) | Link | phenotype morphological responses to perturbation | Nature Methods | Link | Link |
|
2024.01 | scPerturb: harmonized single-cell perturbation data | scRNA perturbation(44 public datasets, drugs and genes) | Nature Methods | Link | Link |
Year | Title | Team | Team Website | Affiliation | Domain | Venue | Paper/ Source | Code/Product |
---|---|---|---|---|---|---|---|---|
2025.06 | Fundamental Limitations of Foundation Models in Single-Cell Transcriptomics | evaluate cell-type classification | bioRxiv | Link | ||||
2025.06 | Cell-Eval(Predicting cellular responses to perturbation across diverse contexts with State) | Arc Institute(Yusuf H. Roohani) | Link | Perturbation scRNA | bioRxiv | Link | Link |
|
2025.05 | The influence of prompt engineering on large language models for protein–protein interaction identification in biomedical literature | Yi-Hsuan Lin | Prompt engineering of LLMs for protein-protein interaction extraction | Scientific Reports | Link | |||
2025.05 | CellVerse: Do Large Language Models Really Understand Cell Biology? | P. Heng | LLMs for language-driven single-cell multi-omics analysis (CellVerse) | ArXiv | Link | |||
2025.05 | Large Language Models Can Extract Metadata for Annotation of Human Neuroimaging Publications | Jessica A Turner | Ohio State University Wexner Medical Center, Columbus, OH, USA | LLM extraction and annotation of neuroimaging metadata | bioRxiv | Link | ||
2025.05 | scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction | Qianqian Song | Benchmarking foundation models for single-cell drug response prediction | ArXiv | Link | |||
2025.04 | Zero-shot evaluation reveals limitations of single-cell foundation models | Microsoft Research(Alex X. Lu) | Link | cell type clustering and batch integration for scGPT and Geneformer | Genome Biology | Link | Link |
|
2025.04 | OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling | Fuhai Li | Cell Text-Omic Signaling Graph Benchmark | ArXiv | Link | |||
2025.04 | Abstract 6316: Predictive performance comparison of foundational and CNN models for single-cell immune profiling | Mai Chan Lau | Single-cell immune profiling benchmark | Cancer Research | Link | |||
2025.03 | Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment | Zhenguang G. Cai | Cognitive Neuroscience & Audio-Language Modeling | ArXiv | Link | |||
2025.02 | BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning | Zhi-Hong Deng | Biological pathway reasoning benchmark | ArXiv | Link | |||
2025.02 | The Skin Game: Revolutionizing Standards for AI Dermatology Model Comparison | Dariusz Jemielniak | Dermatology image classification evaluation framework | ArXiv | Link | |||
2025.02 | Consequences of training data composition for deep learning models in single-cell biology | Lorin Crawford | Harvard Medical School | Training data composition effects in single-cell models | bioRxiv | Link | ||
2025.01 | CARDBiomedBench: A Benchmark for Evaluating Large Language Model Performance in Biomedical Research | F. Faghri | Center for Alzheimer's and Related Dementias, National Institute on Aging, National Institutes of Health, Bethesda, MD, 20892, USA; DataTecnica, Washington, | LLM evaluation in neurodegenerative disease research | bioRxiv | Link | ||
2025.01 | Large Language Models Think Too Fast To Explore Effectively | Robert C. Wilson | Exploration capabilities of LLMs in open-ended tasks | ArXiv | Link | |||
2025.01 | Sequence Modeling Is Not Evolutionary Reasoning | M. Zitnik | Harvard Medical School | Evolutionary reasoning benchmark for protein LLMs | bioRxiv | Link | ||
2024.12 | PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis | Perturbation scRNA | NeurIPS 2024 | Link | Link |
|||
2024.12 | Does your model understand genes? A benchmark of gene properties for biological and text models | Y. Shimoni | Benchmark of gene property prediction models | ArXiv | Link |
Year | Title | Team | Team Website | Affiliation | Domain | Venue | Paper/ Source | Code/Product |
---|---|---|---|---|---|---|---|---|
2025.07 | Human interpretable grammar encodes multicellular systems biology models to democratize virtual cell laboratories | Paul Macklin | Indiana University | Virtual cell | Cell | Link | Link |
|
2025.07 | The generative era of medical AI | Pranav Rajpurkar | Harvard Medical School | Medical AI | Cell | Link | ||
2025.05 | Empowering Biomedical Research with Foundation Models in Computational Microscopy: A Systematic Review | Rong Luo | Foundation models in computational microscopy | Advanced Intelligent Systems | Link | |||
2025.04 | Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions | MCP for agents | arXiv | Link | ||||
2025.04 | Towards multimodal foundation models in molecular cell biology. | Bo Wang | Multimodal Foundation Models in Molecular Cell Biology | Nature | Link | |||
2025.04 | A Survey of the Model Context Protocol (MCP): Standardizing Context to Enhance Large Language Models (LLMs) | MCP for agents | Preprints | Link | ||||
2025.04 | Decoding cancer prognosis with deep learning: the ASD-cancer framework for tumor microenvironment analysis | J. Haran | Deep Learning for Tumor Microenvironment Prognosis | mSystems | Link | |||
2025.03 | Large language model for knowledge synthesis and AI-enhanced biomanufacturing. | Yinjie J. Tang | Biomanufacturing | Trends in biotechnology | Link | |||
2025.03 | Biological Sequence with Language Model Prompting: A Survey | Yu Li | Sequence Analysis | ArXiv | Link | |||
2025.03 | AI-Empowered Genome Decoding: Applications of Large Language Models in Genomics | Yu Zhou | Genomics | Frontiers of Digital Education | Link | |||
2025.03 | Large Language Models in Bioinformatics: A Survey | Yu Li | Bioinformatics | ArXiv | Link | |||
2025.03 | A Conceptual Framework for Human-AI Collaborative Genome Annotation | Radoslaw Suchecki | Genome Annotation | ArXiv | Link | |||
2025.02 | Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective | Yu Takagi | LLM learning dynamics from a neuroscience perspective | ArXiv | Link | |||
2025.02 | Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents | Mariya Toneva | Episodic memory framework for long-term LLM agents | ArXiv | Link | |||
2025.02 | Microbial Ecology to Ocean Carbon Cycling: From Genomes to Numerical Models | E. Zakem | Integration of microbial ecology and numerical models for ocean carbon cycling | Annual Review of Earth and Planetary Sciences | Link | |||
2025.01 | Survey and Improvement Strategies for Gene Prioritization with Large Language Models | Xia Hu | LLM-based gene prioritization | ArXiv | Link | |||
2025.01 | Computational Protein Science in the Era of Large Language Models (LLMs) | Qing Li | Computational protein science with LLMs | ArXiv | Link | |||
2025.01 | Foundation models in bioinformatics | Jianxin Wang | Bioinformatics foundation models overview | National Science Review | Link | |||
2025.01 | AI Methods for Antimicrobial Peptides: Progress and Challenges | César de la Fuente-Nunez | AI methods for antimicrobial peptide design | Microbial Biotechnology | Link | |||
2025.01 | Artificial intelligence driven innovations in biochemistry: A review of emerging research frontiers | M. A. Lateef Junaid | AI innovations in biochemistry | Biomolecules and Biomedicine | Link | |||
2025.01 | Artificial Intelligence Tools Addressing Challenges of Cancer Progression Due to Antimicrobial Resistance in Pathogenic Biofilm Systems | Abhijit G. Banerjee | AI tools for antimicrobial resistance in cancer biofilms | Artificial Intelligence Evolution | Link | |||
2025.01 | Learning the language of life with AI. | E. Topol | Multiomic foundation models for biomolecule prediction | Science | Link | |||
2024.12 | Large language models facilitating modern molecular biology and novel drug development | Fei Liu | Review of LLMs in molecular biology and drug development | Frontiers in Pharmacology | Link | |||
2024.12 | From multi-omics to predictive biomarker: AI in tumor microenvironment | Yingli Sun | Review of AI in tumor microenvironment multi-omics | Frontiers in Immunology | Link | |||
2024.12 | Advancements and Applications of Protein Structure Prediction Algorithms | Ye Chen | Review of protein structure prediction methods | Theoretical and Natural Science | Link | |||
2024.10 | Empowering biomedical discovery with AI agents | Biomedical Agents | Cell | Link | ||||
2024.08 | Transformers in single-cell omics: a review and new perspectives | Single cell foundation models | Nature Methods | Link | ||||
2024.08 | Language models for biological research: a primer | biological foundation models | Nature Methods | Link | ||||
2024.07 | Foundation models for bioinformatics | Bioinformatics foundation models | Quantitative Biology | Link |
Title | Project |
---|---|
Awesome-LLMs-meet-genomes | Link |
Awesome-Virtual-Cell | Link |
Awesome-Phenotypic-Drug-Discovery | Link |
Awesome-LLM-Scientific-Discovery | Link |
Awesome bioagent papers | Link |
Awesome LLM Agents for Scientific Discovery | Link |
Awesome Papers on Agents for Science | Link |
awesome-single-cell | Link |
Awesome-Bioinformatics | Link |
awesome | Link |