Code for the TMLR paper: "Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting"

Use VSCode DevContainers

Setup :

Make sure VSCode has devcontainer extension installed.
You have docker that is already setup (you can run docker ps, docker images) easily.

Running :

Clone the repository : git clone https://github.com/sbhambr1/react_brittleness
Run the devcontainer : VSCode should give a popup to run the code within a devcontainer. If not, then do Cmd + Shift + P to open VSCode command pallete and search for Rebuild Container which should start the devcontainer.
Specify OPENAI_API_KEY, ANTHROPIC_API_KEY as environment variable.

Running Webshop

In the devcontainer use docker image : famishedrover/taxonomy_llm:webshop
Run the webshop by running.

source /webvenv/bin/activate 
cd /webshop/
./run_dev.sh

Open the webpage. VSCode should prompt you, otherwise Flask will also log a message that the website is accessible on link like : 172.0.0.6:3000 (Use the link mentioned in the message!)
Run OpenAI code using native python (not webvenv)

Installation for Local setup

pip install openai anthropic ratelimit alfworld

git clone https://github.com/sbhambr1/react_brittleness
conda create -n react_test python=3.9
conda activate react_test
pip install -r requirements.txt

Directory Setup

mkdir data

Run ReAct Baseline

python runners/react_alfworld.py

Running Webshop

Run patchfix.sh for each container. It updates the /webshop/web_agent_site/utils.py to use the larger dataset and downloads it using webvenv virtual environment present in the container.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ReAct		ReAct
__pycache__		__pycache__
data		data
prompts		prompts
runners		runners
utils		utils
.devcontainer.json		.devcontainer.json
.gitignore		.gitignore
FEVER.ipynb		FEVER.ipynb
LICENSE		LICENSE
README.md		README.md
WebShop.ipynb		WebShop.ipynb
WebShop_instruct.ipynb		WebShop_instruct.ipynb
WebShop_llama.ipynb		WebShop_llama.ipynb
alfworld.ipynb		alfworld.ipynb
base_config.yaml		base_config.yaml
hotpotqa.ipynb		hotpotqa.ipynb
parse_gpt_response.py		parse_gpt_response.py
patchfix.sh		patchfix.sh
perturb_runner_alfworld.py		perturb_runner_alfworld.py
perturb_runner_alfworld_truncated_exec.py		perturb_runner_alfworld_truncated_exec.py
requirements.txt		requirements.txt
run_all.sh		run_all.sh
runner_alfworld.py		runner_alfworld.py
runner_alfworld_trucated_exec.py		runner_alfworld_trucated_exec.py
test_runner_alfworld.py		test_runner_alfworld.py
utils.py		utils.py
webshop_4o.py		webshop_4o.py
webshop_gpt4.py		webshop_gpt4.py
webshop_llama.py		webshop_llama.py
webshop_opus.py		webshop_opus.py
webshop_turbo.py		webshop_turbo.py
webshop_turbo_instruct.py		webshop_turbo_instruct.py
wikienv.py		wikienv.py
wrappers.py		wrappers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code for the TMLR paper: "Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting"

Use VSCode DevContainers

Installation for Local setup

Directory Setup

Run ReAct Baseline

Running Webshop

About

Uh oh!

Releases

Packages

Languages

License

sbhambr1/React_Brittleness

Folders and files

Latest commit

History

Repository files navigation

Code for the TMLR paper: "Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting"

Use VSCode DevContainers

Installation for Local setup

Directory Setup

Run ReAct Baseline

Running Webshop

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages