| Paper | Published in | Resources |
|---|---|---|
| Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences | PNAS, 2021 | Code |
| Language models enable zero-shot prediction of the effects of mutations on protein function | Advances in Neural Information Processing Systems, 2021 | Code |
| Learning inverse folding from millions of predicted structures | ICML, 2022 | Code |
| Evolutionary-scale prediction of atomic-level protein structure with a language model | Science, 2023 | Code |
| Simulating 500 million years of evolution with a language model | bioRxiv, 2024 | Code |
| Paper | Published in | Resources |
|---|---|---|
| MSA transformer | ICML, 2021 | Code |
| Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval | ICML, 2022 | Code |
| Leveraging protein language models for accurate multiple sequence alignments | Genome Research, 2023 | Code |
| PoET: A generative model of protein families as sequences-of-sequences | Neurips, 2023 | Code |
| Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes | Nature Communications, 2023 | Code |
| Paper | Published in | Resources |
|---|---|---|
| A systematic study of joint representation learning on protein sequences and structures | arXiv preprint, 2023 | Code |
| Saprot: Protein language modeling with structure-aware vocabulary | bioRxiv, 2023 | Code |
| Simple, Efficient and Scalable Structure-aware Adapter Boosts Protein Language Models | arXiv preprint, 2024 | Code |
| Multi-level Protein Structure Pre-training via Prompt Learning | ICLR, 2023 | Code |
| Structure-informed protein language models are robust predictors for variant effects | Human Genetics, 2024 | N/A |
| Integration of pre-trained protein language models into geometric deep learning networks | Communications Biology, 2023 | Code |
| Structure-Informed Protein Language Model | arXiv preprint, 2024 | Code |
| S-PLM: Structure-Aware Protein Language Model via Contrastive Learning Between Sequence and Structure | Advanced Science, 2024 | Code |
| CCPL: Cross-modal Contrastive Protein Learning | Pattern Recognition, 2024 | N/A |
| Paper | Published in | Resources |
|---|---|---|
| OntoProtein: Protein Pretraining With Gene Ontology Embedding | ICLR, 2022 | Code |
| ProteinCLIP: enhancing protein language models with natural language | bioRxiv, 2024 | Code |
| ProteinBERT: a universal deep-learning model of protein sequence and function | Bioinformatics, 2022 | Code |
| Protein Representation Learning via Knowledge Enhanced Primary Structure Reasoning | ICLR, 2023 | Code |
| MolBind: Multimodal Alignment of Language, Molecules, and Proteins | arXiv preprint, 2024 | N/A |
| Paper | Published in | Resources |
|---|---|---|
| Prot2text: Multimodal protein’s function generation with gnns and transformers | AAAI, 2024 | Code |
| Protranslator: zero-shot protein function prediction using textual description | International Conference on Research in Computational Molecular Biology, 2022 | Code |
| Multilingual translation for zero-shot biomedical classification using BioTranslator | Nature Communications, 2023 | Code |
| Biot5: Enriching cross-modal integration in biology with chemical knowledge and natural language associations | EMNLP, 2023 | Code |
| BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning | ACL, 2024 | Code |
| ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts | ICML, 2023 | Code |
| ProtChatGPT: Towards Understanding Proteins with Large Language Models | arXiv, 2024 | N/A |
| ProteinChat: Towards Achieving ChatGPT-Like Functionalities on Protein 3D Structures | TechRxiv, 2023 | N/A |
| Paper | Published in | Resources |
|---|---|---|
| Large language models generate functional protein sequences across diverse families | Nature Biotechnology, 2023 | Code |
| ProtGPT2: Deep Unsupervised Language Model for Protein Design | Nature Communications, 2022 | Code |
| ProGen2: Exploring the Boundaries of Protein Language Models | Cell Systems, 2023 | Code |
| IgLM: Infilling Language Modeling for Antibody Sequence Design | Cell Systems, 2023 | Code |
| PALM-H3: Targeted Antibody Generation for SARS-CoV-2 | Nature Communications, 2024 | Code |
| Integrating protein language models and automatic biofoundry for enhanced protein evolution | Nature Communications, 2025 | Code |
| Paper | Published in | Resources |
|---|---|---|
| ProtST: Multi-modality Learning of Protein Sequences and Biomedical Texts | ICML 2023 | Code |
| ProteinBERT: a universal deep-learning model of protein sequence and function | Bioinformatics, 2022 | Code |
| Bertology meets biology: Interpreting attention in protein language models | arXiv preprint, 2020 | Code |
| Prottrans: Toward understanding the language of life through self-supervised learning | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021 | Code |
| Modeling protein using large-scale pretrain language model | arXiv preprint, 2021 | Code |
| Paper | Published in | Resources |
|---|---|---|
| ProstT5: Bilingual Modeling of Protein Sequence and Structure | bioRxiv, 2023 | Code |
| Fold2Seq: A Joint Sequence–Fold Embedding-based Generative Model for Protein Design | ICML 2021 | Code |
| Ankh: Optimized Protein Language Model for Efficient Generation | arXiv, 2023 | Code |
| Paper | Published in | Resources |
|---|---|---|
| ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding | arXiv, 2024 | Code |
| ProteinChat: ChatGPT-like Functionalities on Protein 3D Structures | Authorea Preprints, 2023 | Code |
| ProtChatGPT: Towards Understanding Proteins with Large Language Models | arXiv, 2024 | Code |
| ProteinDT: A Text-guided Protein Design Framework | arXiv, 2023 | Code |
| Paper | Published in | Resources |
|---|---|---|
| Artificial intelligence to solve the X-ray crystallography phase problem: a case study report | BiorXiv, 2021 | N/A |
| Paper | Published in | Resources |
|---|---|---|
| FID-Net: A versatile deep neural network architecture for NMR spectral reconstruction and virtual decoupling | Journal of Biomolecular NMR, 2021 | Code |
| Accelerated Nuclear Magnetic Resonance Spectroscopy with Deep Learning | Angewandte Chemie, 2020 | Code |
| Paper | Published in | Resources |
|---|---|---|
| CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks | Nature Methods, 2021 | Code |
| CryoGAN: A New Reconstruction Paradigm for Single-Particle Cryo-EM Via Deep Adversarial Learning | IEEE Transactions on Computational Imaging , 2021 | Code |
| Deep learning-based mixed-dimensional Gaussian mixture model for characterizing variability in cryo-EM | Nature Methods, 2021 | Code |
| 3dflex: determining structure and motion of flexible proteins from cryo-em | Nature Methods, 2023 | Code |
| Cryostar: leveraging structural priors and constraints for cryo-em heterogeneous reconstruction | Nature Methods, 2024 | Code |
| Dataset Name | Description | Resources |
|---|---|---|
| UniProtKB/Swiss-Prot | Manually curated protein database with detailed functional annotations | Link |
| UniProtKB/TrEMBL | Automatically annotated protein database with computational analysis | Link |
| UniRef Clusters | Clustered protein sequences for reduced redundancy and efficient searches | Link |
| Pfam | Database of protein families and domains | Link |
| PDB | Database of 3D structures of biological macromolecules | Link |
| BFD | Large database of clustered protein sequences | Link |
| UniParc | Non-redundant archive of protein sequences from public databases | Link |
| PIR | Comprehensive annotated protein sequence database | Link |
| AlphaFoldDB | Database of predicted protein structures using AI | Link |
| Paper | Published in | Resources |
|---|---|---|
| Critical assessment of methods of protein structure prediction (CASP)—Round XV | Proteins: Structure, Function, and Bioinformatics | Link |
| ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design | Neurips, 2023 | Code |
| Evaluating protein transfer learning with tape | Neurips, 2019 | Code |
| CATH–a hierarchic classification of protein domain structures | Structure, 1997 | Link |
| Peer: a comprehensive and multi-task benchmark for protein sequence understanding | Neurips, 2022 | Code |
| ExplorEnz: the primary source of the IUBMB enzyme list | Nucleic acids research, 2009 | Link |
| HIPPIE: Integrating protein interaction networks with experiment based quality scores | PloS One, 2012 | Link |
| A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding | arXiv, 2019 | Code |
| Paper | Published in | Resources |
|---|---|---|
| Accurate structure prediction of biomolecular interactions with AlphaFold 3 | Nature, 8 May 2024 | Code |
| ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction | COLM | Code |
| Protein-Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools | Journal of Proteome Research | N/A |