This repository introduces the position paper titled "Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition". The paper investigates how foundation models (FMs) are evolving from enhancing individual research tasks to driving a broader transition in the scientific paradigm itself.
Our paper is avaiable at ResearchGate.
- Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition
- 1. Outline
- 2. Introduction
- 3. Core Philosophy
- 4. FM-Assisted Scientific Discovery Examples
- 5. Foundation Model Integration Across Scientific Paradigms
- 6. Summary
- 7. License
- 8. Citation
Foundation Models such as GPT-4, AlphaFold, and FunSearch are profoundly transforming how scientific discovery is conducted. Scientific progress has historically unfolded through four foundational paradigms: experimental, theoretical, computational, and data-driven. Each paradigm introduced new modes of inquiry, from systematic observation and mathematical abstraction to simulation-based analysis and large-scale data pattern extraction.
As scientific challenges become increasingly complex, interdisciplinary, and dynamic, the limitations of existing paradigms have become more pronounced. Foundation Models are emerging as versatile systems that integrate reasoning, modeling, language processing, and data generation. These models unify capabilities across domains, enabling end-to-end support for scientific processes including literature review, hypothesis generation, experiment design, and interpretation of results.
This position paper argues that Foundation Models are not merely tools for speeding up existing workflows. They are actively reshaping the underlying structure of scientific practice. We introduce a three-stage framework that captures this ongoing transformation:
Figure 1. Evolving scientific paradigms empowered by FMs. FMs progressively transition from tool-like infrastructure (meta-scientific integration), to interactive co-creators (hybrid human–AI collaboration), and ultimately to autonomous agents capable of end-to-end scientific discovery.
In the first stage, Foundation Models function as intelligent infrastructure that improves efficiency within traditional paradigms. In the second stage, they operate as collaborators that participate in research ideation and execution. In the third stage, they may evolve into autonomous agents capable of initiating and completing scientific inquiries with minimal human intervention.
To contextualize this transformation, we trace the historical evolution of scientific paradigms:
Figure 2. A Roadmap of Scientific Discovery Paradigms and Their Epistemic Capabilities.
Each new paradigm emerged in response to the limitations of the previous one. The experiment-driven paradigm emphasized empirical validation. The theory-driven paradigm brought deductive abstraction. The computation-driven paradigm enabled exploration of analytically intractable systems. The data-driven paradigm introduced inductive discovery from large-scale observations. Foundation Models now present a fifth paradigm that integrates learning, reasoning, and generation within a unified and adaptable framework.
This paper offers a conceptual roadmap for understanding the trajectory of Foundation Models in science. It highlights not only their technical capacities but also their growing epistemic influence. By positioning these models as active participants in knowledge creation, we invite deeper reflection on what science means in an era of intelligent machines.
- Human: Responsible for framing problems, making final decisions, and maintaining scientific integrity.
- Foundation Models: Serve as assistants or collaborators that enhance searching, reasoning, modeling, and writing.
- Human-in-the-Loop: Ensures that human agency, ethical accountability, and critical judgment remain central.
- Increase the efficiency of research workflows such as literature review, experiment design, and scientific writing.
- Improve research quality by supporting structured reasoning and cross-domain knowledge integration.
- Lower the barrier of entry for newcomers to engage in high-quality scientific inquiry.
- Preserve human control and oversight to avoid risks associated with full automation.
FMs are increasingly integrated into every stage of the scientific research workflow, enabling researchers to accelerate and enhance the discovery process. In practice, FMs support a seamless pipeline that spans from the initial exploration of research topics, through comprehensive literature surveys, precise problem formulation, and innovative methodology design, to experimental execution and scientific writing. By leveraging FMs, scientists can efficiently identify emerging trends, synthesize vast bodies of knowledge, refine research questions, design robust experiments, and generate or analyze data. This section illustrates how FMs can be systematically embedded into each step of the scientific process, providing practical examples and recommended tools for each phase.
Foundation models assist researchers in identifying emerging topics, knowledge gaps, and interdisciplinary intersections by querying large-scale corpora and structured databases.
Example:
Human: What are the underexplored challenges in protein design in academic research?
AI: [Underexplored Challenges in Protein Design by OpenAI Deep Research]
See: Underexplored Challenges in Protein Design by OpenAI Deep Research
Recommended Tools:
FMs summarize academic literature, cluster publications into themes, and highlight influential papers or trends.
Example:
Human: Please help me conduct a literature survey on protein design.
AI: [Survey about protein design by OpenAI Deep Research]
See: Survey about protein design by OpenAI Deep Research
Recommended Tools:
FMs enable iterative refinement of research questions based on novelty, feasibility, or scope, offering domain-specific insights and suggestions.
Example:
Human: I intend to design a foundation model-based agent for protein design. Please analyze and refine the research problem.
AI: [Problem Refinement by OpenAI Deep Research]
See: Problem Refinement by OpenAI Deep Research
Recommended Tools:
FMs provide assistance in designing or modifying research methodologies, including experiment protocols, model selection, and variable control strategies.
Example:
Human: Please help me design a deep learning model for analyzing gene expression data, taking into account as many factors as possible.
AI: [Methodology Design by OpenAI Deep Research]
See: Methodology Design by OpenAI Deep Research.
Recommended Tools:
FMs generate executable code, simulate experimental conditions, or interact with laboratory robots, enabling rapid testing and iteration.
Example:
Human: Please generate the corresponding program according to this method: [Method].
AI: [Code generated by ChatGPT]
See: Method, Code generated by ChatGPT.
Recommended Tools:
FMs assist with outlining, drafting, rewriting, and proofreading scientific papers. They also help generate visuals or translate content between formats.
Example: An LLM transforms a collection of experimental logs and plots into a structured LaTeX paper draft, including title suggestions, abstract summaries, and formatted references.
Recommended Tools:
-
(ICLR'25) BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [paper]
-
(NeurIPS'23) ExPT: Synthetic Pretraining for Few-Shot Experimental Design [paper]
-
(Opto-Electron Adv'24) OptoGPT: A Foundation Model for Inverse Design in Optical Multilayer Thin Film Structures [paper]
-
(arXiv 2024.06) LLMatDesign: Autonomous Materials Discovery with Large Language Models [paper]
-
(ICML'24) A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules? [paper]
-
(arXiv 2025.01) Language-Based Bayesian Optimization Research Assistant (BORA). [paper]
-
(ICLR'25) Bayesian Experimental Design via Contrastive Diffusions [paper]
-
(Nature'23) Autonomous Chemical Research with Large Language Models [paper]
-
(Nature Communications'24) An automatic end-to-end chemical synthesis development platform powered by large language models [paper]
-
(Autonomous Robots'23) Large language models for chemistry robotics [paper]
-
(Digital Discovery'25) From text to test: AI-generated control software for materials science instruments [paper]
-
(Machine Learning: Science and Technology'25) VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities [paper]
-
(arXiv 2024.09) AP-VLM: Active Perception Enabled by Vision-Language Models [paper]
-
(arXiv 2025.04) A Survey on Hypothesis Generation for Scientific Discovery in the Era of Large Language Models [paper]
-
(BigData Conference'24) Embracing Foundation Models for Advancing Scientific Discovery [paper]
-
(arXiv 2025.04) Sparks of Science: Hypothesis Generation Using Structured Paper Data [paper]
-
(ACL'24) Exploring Scientific Hypothesis Generation with Mamba [paper]
-
(arXiv 2025.02) Towards Physics-Guided Foundation Models [paper]
-
(AAAI'25) Physics-Guided Foundation Model for Scientific Discovery: An Application to Aquatic Science [paper]
-
(EMNLP'23) Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning [paper]
-
(CoLLAs'24) SymbolicAI: A framework for logic-based approaches combining generative models and solvers [paper]
-
(NeSy'24) The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning [paper]
-
(arXiv 2025.02) Automated Hypothesis Validation with Agentic Sequential Falsifications [paper]
-
(arXiv 2024.04) Towards Large Language Models as Copilots for Theorem Proving in Lean [paper]
-
(arXiv 2024.05) DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data [paper]
-
(AAAI'24) Relational Programming with Foundational Models [paper]
-
(ICLR'25) LLM-SR: Scientific equation discovery via programming with large language models [paper]
-
(Nature'24) Mathematical discoveries from program search with large language models [paper]
-
(NeurIPS 2024) Symbolic Regression with a Learned Concept Library [paper]
-
(arXiv 2024.04) Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation [paper]
-
(NeurIPS 2024) DiffusionPDE: Generative PDE-Solving under Partial Observation [paper]
-
(JMLR'23) Neural operator: learning maps between function spaces with applications to PDEs [paper]
-
(NeurIPS'24) Pretraining codomain attention neural operators for solving multiphysics PDE [paper]
-
(Science'23) Learning skillful medium-range global weather forecasting [paper]
-
(NeurIPS'23) Scalable transformer for pde surrogate modeling [paper]
-
(NeurIPS'23) Pde-refiner: Achieving accurate long rollouts with neural pde solvers [paper]
-
(Bioinformatics'21) DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome [paper]
-
(Nature Machine Intelligence'22) Large-scale chemical language representations capture molecular structure and properties [paper]
-
(AAAI'25) Chemvlm: Exploring the power of multimodal large language models in chemistry area [paper]
-
(Nature Machine Intelligence'24) Augmenting large language models with chemistry tools [paper]
-
(arXiv 2024.11) Galactica: A large language model for science [paper]
-
(Nature Medicine'25) Toward expert-level medical question answering with large language models [paper]
-
(ICML'23) ClimaX: A foundation model for weather and climate [paper]
-
(Nature'23) Accurate medium-range global weather forecasting with 3D neural networks [paper]
-
(ICLR'24) DiffusionSat: A Generative Foundation Model for Satellite Imagery [paper]
-
(Nature'21) Highly accurate protein structure prediction with AlphaFold [paper]
-
(Science'23) Evolutionary-scale prediction of atomic-level protein structure with a language model [paper]
-
(Nature'23) De novo design of protein structure and function with RFdiffusion [paper]
-
(Nature'25) A generative model for inorganic materials design [paper]
-
(arXiv 2025.05) ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows [paper]
-
(arXiv 2025.04) The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search [paper]
-
(arXiv 2025.01) Agent laboratory: Using LLM agents as research assistants [paper]
-
(NeurIPS Workshop'23) Benchmarking large language models as ai research agents [paper]
-
(arXiv 2024.08) The ai scientist: Towards fully automated open-ended scientific discovery [paper]
-
(NeurIPS Workshop'24) PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics [paper]
-
(NeurIPS'24) Latent neural operator for solving forward and inverse PDE problems [paper]
-
(Nature'23) Autonomous chemical research with large language models [paper]
This paper proposes a shift from the traditional human-led research process to a future of human-AI collaboration and, eventually, autonomous scientific inquiry. Foundation models are not only improving scientific efficiency but also introducing new epistemic agents capable of participating in, and potentially leading, the generation of scientific knowledge. This transformation calls for a rethinking of research workflows, authorship, accountability, and the future of science itself.
This project is licensed under the MIT License.
If you find this work useful, please consider citing:
@misc{liu2025foundation,
author = {Fan Liu and Jindong Han and Tengfei Lyu and others},
title = {Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition},
howpublished = {TechRxiv},
year = {2025},
month = {June},
note = {Registration in progress},
doi = {10.36227/techrxiv.174953071.19189612/v1},
url = {https://doi.org/10.36227/techrxiv.174953071.19189612/v1}
}