Skip to content

Recreating every milestone in Machine Learning and Artificial Intelligence

duoan/ReplicateAI

Repository files navigation

🧠 ReplicateAI

Recreating every milestone in Machine Learning and Artificial Intelligence — from Transformers to Perceptrons.

License: MIT Contributions welcome Status


🚀 Overview

ReplicateAI is an open initiative to rebuild and verify every major paper in ML/AI history,
starting from modern foundation models (2023–2025) and tracing backward to the origins of AI.

We believe that understanding AI means rebuilding it — line by line, layer by layer.

quote


🧩 Project Vision

“Because science means reproducibility.”

  • 📜 Goal: Faithfully re-implement influential ML/AI papers with open code, datasets, and experiments
  • 🧱 Scope: From Qwen2.5 (2025) to Perceptron (1958)
  • 🧠 Approach: Reverse timeline — start with Foundation Models, then trace history backward
  • 🧾 Output: Each paper becomes a self-contained, reproducible module with reports and experiments

🪐 Stage 1 — Foundation & Multimodal Era (2023–2025)

The golden age of open-source foundation models.

Year Paper / Model Organization Why It Matters Replicate Goal Status
2025 Qwen2.5 Alibaba Fully open multimodal model (text + image) Rebuild text/image pipeline 🧭 Planned
2025 DeepSeek-V2 DeepSeek MoE + RLHF efficiency breakthrough Replicate expert routing and reward pipeline 🧭 Planned
2025 Claude 3 Family Anthropic Leading alignment via Constitutional AI Explore rule-based alignment principles 🧭 Planned
2024 LLaMA 3 Meta Open foundation model standard Implement scaled transformer + tokenizer 🧭 Planned
2024 Mixtral 8×7B Mistral Sparse Mixture-of-Experts architecture Implement routing + expert parallelism 🧭 Planned
2024 Phi-2 / Phi-3 Microsoft Small but high-quality model; data-centric Rebuild synthetic data pipeline 🧭 Planned
2024 Gemini 1 / 1.5 Google DeepMind Vision + Text + Reasoning Prototype multimodal reasoning pipeline 🧭 Planned
2023 Qwen-VL Alibaba Vision-language alignment model Replicate visual encoder + text fusion 🧭 Planned
2023 BLIP-2 / MiniGPT-4 Salesforce / HKU Lightweight multimodal bridging Implement pretrain connector 🧭 Planned
2023 LLaMA 1 / 2 Meta Open LLM baseline Implement tokenizer + attention stack 🧭 Planned

🔍 Stage 2 — Representation & Sequence Models (2013–2021)

Year Paper Author Goal Status
2021 CLIP Radford, et al. Align Vision and NLP in same space using contrastive learning 🔬 Replicating
2020 ViT Dosovitskiy et al. Vision Transformer ✅ Done
2018 BERT Devlin et al. Masked Language Modeling 🔬 Replicating
2017 Transformer Vaswani et al. “Attention Is All You Need” ✅ Done
2014 Seq2Seq Sutskever et al. Encoder-decoder translation 🧭 Planned
2013 Word2Vec Mikolov et al. Learn word embeddings 🧭 Planned
2015 Bahdanau Attention Bahdanau et al. RNN + Attention 🧭 Planned

🧩 Stage 3 — Deep Learning Renaissance (2006–2014)

Year Paper Author Goal Status
2015 ResNet He et al. Residual learning 🧭 Planned
2014 VGG Simonyan et al. Deep CNN architectures 🧭 Planned
2012 AlexNet Krizhevsky et al. GPU-based CNN 🧭 Planned
2006 DBN / RBM Hinton Layer-wise pretraining 🧭 Planned

📊 Stage 4 — Statistical Learning Era (1990s–2000s)

Year Paper Author Goal Status
2001 Random Forests Breiman Ensemble learning 🧭 Planned
1997 AdaBoost Freund & Schapire Boosting algorithms 🧭 Planned
1995 SVM Vapnik Maximum margin classifier 🧭 Planned
1977 EM Algorithm Dempster et al. Expectation-Maximization 🧭 Planned

🧬 Stage 5 — Early Neural Foundations (1950s–1980s)

Year Paper Author Goal Status
1986 Backpropagation Rumelhart et al. Gradient-based learning 🧭 Planned
1985 Boltzmann Machine Hinton et al. Generative stochastic model 🧭 Planned
1982 Hopfield Network Hopfield Associative memory 🧭 Planned
1958 Perceptron Rosenblatt Linear separability 🧭 Planned

Lifecycle

🧭 Planned
   ↓
🔬 In Reproduction
   ↓
🧪 Under Evaluation
   ↓
📈 Verified
   ↓
🧾 Documented
   ↓
🧰 Extended (optional)

📁 Repository Structure


ReplicateAI/
├── stage1_foundation/
│   ├── 2025_Qwen2.5/
│   ├── 2024_LLaMA3/
│   └── 2023_CLIP/
├── stage2_representation/
│   ├── 2018_BERT/
│   ├── 2017_Transformer/
│   └── 2013_Word2Vec/
├── stage3_deep_renaissance/
│   ├── 2015_ResNet/
│   ├── 2012_AlexNet/
│   └── 2006_DBN/
├── stage4_statistical/
│   ├── 2001_RandomForest/
│   └── 1995_SVM/
└── stage5_foundations/
├── 1986_Backprop/
└── 1958_Perceptron/

Each paper module includes:


📄 README.md   — Paper summary & objective
📘 report.md   — Reproduction results & analysis
📓 notebook/   — Interactive demo
💻 src/        — Core implementation
🔗 references.bib — Original citation


🤝 Contributing

We welcome contributions from researchers, engineers, and students who believe in reproducibility.

  1. Fork the repo
  2. Pick a paper or model not yet implemented
  3. Follow the Paper Template
  4. Submit a PR with your code and report

Please include:

  • clear code (PyTorch / JAX / NumPy)
  • short experiment or visualization
  • reproducibility notes or deviations

🧮 Progress Overview

Stage Era Progress
🪐 Foundation (2023–2025) Modern LLM & Multimodal ░░░░░░░░░░░░░░ 0%
🔍 Representation (2013–2020) Transformers & Embeddings ░░░░░░░░░░░░░░ 0%
🧩 Deep Renaissance (2006–2014) CNN Era ░░░░░░░░░░░░░░ 0%
📊 Statistical (1990s–2000s) Classical ML ░░░░░░░░░░░░░░ 0%
🧬 Foundations (1950s–1980s) Neural Origins ░░░░░░░░░░░░░░ 0%

📚 Citation

If you use or reference this project, please cite:

@misc{replicateai2025,
  author = {ReplicateAI Contributors},
  title = {ReplicateAI: Rebuilding the History of Machine Learning and Artificial Intelligence},
  year = {2025},
  url = {https://github.com/duoan/ReplicateAI}
}

💬 Motto

“Replicate. Verify. Understand.”


⭐️ Star this repo if you believe reproducibility is the foundation of true intelligence.

Releases

No releases published

Packages

No packages published