This paper review was conducted as part of the seminar "Text Analytics - Generative AI."
In recent years, the reasoning capabilities of language models have been significantly improved. Notably, the use of exemplars in few-shot prompting has led to substantial progress. However, this method demands additional human annotations, resulting in extended and costly instruction-tuning for language models. Despite previous attempts at prompt design, several reasoning tasks show limited improvement, represented by a flat scaling curve. In this review, I provide an analysis of a novel approach called “chain-of-thought prompting”. The innovative technique aims to prompt language models with only a few exemplars, including intermediate steps of reasoning, following human problem-solving methods, producing superior outcomes in arithmetic, common-sense, and symbolic reasoning tasks compared to standard prompting, and achieving state-of-the-art reasoning accuracy. The findings of the paper demonstrate that chain-of-thought prompting significantly improves the performance of language models across various reasoning tasks, particularly when the model is scaled and has a minimum of 100 billion parameters.
The original paper can be found here: Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2023). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903.