Skip to content
Change the repository type filter

All

    Repositories list

    • ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry
      Python
      22011Updated Aug 1, 2025Aug 1, 2025
    • LIMO

      Public
      [COLM 2025] LIMO: Less is More for Reasoning
      Python
      5199300Updated Jul 30, 2025Jul 30, 2025
    • ASI-Arch

      Public
      AlphaGo Moment for Model Architecture Discovery.
      Python
      14985830Updated Jul 29, 2025Jul 29, 2025
    • ASI4AI

      Public
      JavaScript
      2500Updated Jul 23, 2025Jul 23, 2025
    • Reproducible and flexible LLM evaluations for scientific reasoning.
      Python
      01700Updated Jul 23, 2025Jul 23, 2025
    • MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
      Python
      47500Updated Jul 23, 2025Jul 23, 2025
    • Revisiting Mid-training in the Era of Reinforcement Learning Scaling
      Jupyter Notebook
      1216030Updated Jul 23, 2025Jul 23, 2025
    • ProX

      Public
      [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
      Python
      1925510Updated Jul 8, 2025Jul 8, 2025
    • anole

      Public
      Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
      Python
      48780331Updated Jun 16, 2025Jun 16, 2025
    • Efficient Agent Training for Computer Use
      Python
      312100Updated Jun 6, 2025Jun 6, 2025
    • Doodling our way to AGI ✏️ 🖼️ 🧠
      Python
      27910Updated May 29, 2025May 29, 2025
    • LIMOPro

      Public
      Python
      01110Updated May 27, 2025May 27, 2025
    • DynToM

      Public
      Python
      0700Updated May 26, 2025May 26, 2025
    • 0010Updated May 25, 2025May 25, 2025
    • ToRL

      Public
      Python
      10258170Updated May 24, 2025May 24, 2025
    • PC-Agent

      Public
      PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World
      Python
      2227410Updated May 21, 2025May 21, 2025
    • Generative AI Act II: Test Time Scaling Drives Cognition Engineering
      Python
      919810Updated Apr 22, 2025Apr 22, 2025
    • Scaling Deep Research via Reinforcement Learning in Real-world Environments.
      Python
      4153890Updated Apr 13, 2025Apr 13, 2025
    • MAYE

      Public
      Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
      Python
      813830Updated Apr 9, 2025Apr 9, 2025
    • Python
      2000Updated Apr 5, 2025Apr 5, 2025
    • MathPile

      Public
      [NeurlPS D&B 2024] Generative AI for Math: MathPile
      Python
      2141500Updated Apr 4, 2025Apr 4, 2025
    • cs2916

      Public
      Python
      152600Updated Mar 27, 2025Mar 27, 2025
    • Python
      67120Updated Mar 11, 2025Mar 11, 2025
    • [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
      JavaScript
      510200Updated Mar 6, 2025Mar 6, 2025
    • LIMR

      Public
      Python
      8205150Updated Feb 20, 2025Feb 20, 2025
    • O1 Replication Journey
      662k140Updated Jan 14, 2025Jan 14, 2025
    • [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
      Python
      36310Updated Dec 15, 2024Dec 15, 2024
    • The Walnut Plan
      01100Updated Oct 10, 2024Oct 10, 2024
    • OpenResearcher, an advanced Scientific Research Assistant
      HTML
      3945212Updated Oct 10, 2024Oct 10, 2024
    • A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
      Python
      18200Updated Oct 6, 2024Oct 6, 2024