Language Model Reasoning in Base64

Repository for the blog post: Language Model Reasoning in Base64.

Introduction

Figure 1. An illustration of how certain types of reasoning is independent of language. A problem posed in English (orange) or French (purple) evokes the same reasoning process, leading to the same answer in different languages.

Humans' abilities to solve math problems is independent of language -- if you can (or cannot) solve a problem stated in English, and if you understand, say French, you will (or will not) be able to solve the same problem presented to you in French (see Figure 1). This repository contains the code to run some experiments towards checking if the same holds for LLMs such as GPT-4o.

Usage

python run_evaluation.py \
    --data_path $data_path \
    --out_root_dir $out_root_dir \
    --few_shot_data_path $few_shot_data_path \
    --temperature $temperature \
    --max_tokens $max_tokens \
    --num_threads $num_threads \
    --rpm_limit $rpm_limit \
    --num_examples $num_examples \
    --system_prompt $system_prompt \
    --evaluator $evaluator \
    --model_name $model_name \
    --few_shot_k $few_shot_k

At the moment, evaluations for two-operand addition are supported in English and base64 using few-shot or chain-of-thought prompting.

The following configuarations correspond to the various experimental settings:

model_name: gpt-4o
system_prompt $\in$ {assistant, base64_assistant, base64_cot_english_assistant, base64_cot_base64_assistant}
evaluator $\in$ {english_evaluator, base64_evaluator, base64_cot_base64_evaluator, base64_cot_english_evaluator}

Data

All the data for the arithmetic evaluations is available at data/arithmetic which was prepared using the scripts in data_preprocess/.

All the model outputs from GPT-4o are available at model_outputs/arithmetic/.

Acknowledgements

The code in this repository is based on the OpenAI simple-evals.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data/arithmetic		data/arithmetic
data_preprocess		data_preprocess
misc		misc
model_outputs/arithmetic		model_outputs/arithmetic
sampler		sampler
static		static
.gitignore		.gitignore
README.md		README.md
arithmetic_evaluator.py		arithmetic_evaluator.py
evaluator.py		evaluator.py
run_evaluation.py		run_evaluation.py
run_evaluation.sh		run_evaluation.sh
translate.py		translate.py
translation_evaluator.py		translation_evaluator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language Model Reasoning in Base64

Introduction

Usage

Data

Acknowledgements

About

Uh oh!

Languages

nihaljn/lm-reasoning-base64

Folders and files

Latest commit

History

Repository files navigation

Language Model Reasoning in Base64

Introduction

Usage

Data

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages