This repository contains various resources for the paper Rethinking Reflection in Pre-Training by Essential AI.
- Prompts
- Results
- Tasks
- Classifier
- Datasets (on Hugging Face)
Updates:
- 2025-04 - we released our paper and made this announcement.
A language model’s ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier—during the model’s pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo-2- 7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.
As detailed in our paper, we modified existing datasets to create adversarial datasets that can elicit reflection behavior from LLMs. Our modifications involved prompting LLMs and the prompts/
directory includes the specific instructions for each dataset.
Our experiments produced a substantial volume of results and samples JSON files. These are available via Essential AI's Hugging Face hub. Additionally, in results/
we share csv files that collate our results, along with the plots from our paper and the code used to create them.
Our experiments utilized the EleutherAI Language Model Evaluation Harness. We created custom tasks, which are available in tasks/
:
bbh_adv
cruxeval_i_adv
cruxeval_o_adv
gsm8k_adv
gsm8k-platinum_adv
triviaqa_adv
Install lm-eval
as per the LM-Evaluation Harness instructions here.
Next, git clone https://github.com/Essential-AI/reflection.git
.
Then, when calling lm-eval
, include some or all of our custom task names, along with the --include_path
flag and the path to reflection/tasks/
. For example:
lm_eval --model hf \
--tasks bbh_adv,cruxeval_i_adv,cruxeval_o_adv,gsm8k_adv,gsm8k-platinum_adv,triviaqa_adv
--include_path reflection/tasks
--model_args pretrained=EleutherAI/pythia-160m,revision=step100000,dtype="float" \
--device cuda:0 \
--batch_size auto:4
See here for more details on how to use the lm-eval
interface.
For the reflection classifier described in our paper, we created fewshot prompts tailored to each task. We share these in classifer
as well as the golden labels obtained from human annotators.
Our code and results are made available under the MIT license and our datasets are made available under the CC BY-SA 4.0 license. Please see here for license details of each dataset.
Please cite both our paper and the original work for each dataset that we modified. Please see here for the relevant citation details for each dataset.
@misc{ai2025rethinkingreflectionpretraining,
title={Rethinking Reflection in Pre-Training},
author={Essential AI and : and Darsh J Shah and Peter Rushton and Somanshu Singla and Mohit Parmar and Kurt Smith and Yash Vanjani and Ashish Vaswani and Adarsh Chaluvaraju and Andrew Hojel and Andrew Ma and Anil Thomas and Anthony Polloreno and Ashish Tanwer and Burhan Drak Sibai and Divya S Mansingka and Divya Shivaprasad and Ishaan Shah and Karl Stratos and Khoi Nguyen and Michael Callahan and Michael Pust and Mrinal Iyer and Philip Monk and Platon Mazarakis and Ritvik Kapila and Saurabh Srivastava and Tim Romanski},
year={2025},
eprint={2504.04022},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.04022},
}