Code for the TMLR paper: "Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting"
Setup :
- Make sure VSCode has devcontainer extension installed.
- You have docker that is already setup (you can run
docker ps
,docker images
) easily.
Running :
- Clone the repository :
git clone https://github.com/sbhambr1/react_brittleness
- Run the devcontainer : VSCode should give a popup to run the code within a devcontainer. If not, then do Cmd + Shift + P to open VSCode command pallete and search for
Rebuild Container
which should start the devcontainer. - Specify
OPENAI_API_KEY
,ANTHROPIC_API_KEY
as environment variable.
Running Webshop
- In the devcontainer use docker image :
famishedrover/taxonomy_llm:webshop
- Run the webshop by running.
source /webvenv/bin/activate
cd /webshop/
./run_dev.sh
-
Open the webpage. VSCode should prompt you, otherwise Flask will also log a message that the website is accessible on link like :
172.0.0.6:3000
(Use the link mentioned in the message!) -
Run OpenAI code using native python (not webvenv)
pip install openai anthropic ratelimit alfworld
git clone https://github.com/sbhambr1/react_brittleness
conda create -n react_test python=3.9
conda activate react_test
pip install -r requirements.txt
mkdir data
python runners/react_alfworld.py
Run patchfix.sh
for each container. It updates the /webshop/web_agent_site/utils.py
to use the larger dataset and downloads it using webvenv
virtual environment present in the container.