This repository contains the code and dataset used in the experiments described here.
Each problem's package has the following structure:
model
: Representations of problem instances and solutions for the problem.generator
: Functionality for generating problem instances.llm
: LLM-based solvers for the problem.symbolic
: Symbolic solvers for the problem.alt
: Alternate (typically suboptimal) solvers for the problem.
Each problem creates subclasses of classes defined in the base
directory. Consult the docstrings and comments there to better understand how components interact.
All instances are stored in data/problem_instances
. The instances constituting the EHOP dataset are the ones generated in-house (and thus in each problem's respective in_house
folder). The demo
instance for each problem is what is used as the one-shot example in each case. Individual instances can be loaded using a problem's Loader
class, and they can then be used to evaluate Solution
s, both for standard and inverted variants. We recommend consulting the base/problem_structures.py
file to better understand how to interact with individual instances of the EHOP dataset.
Many results from our experiments can be found in the data/results
folder. Matching the file structure everywhere else in the repository, results are sorted by problem. Within each problem's folder, results are divided into those obtained using greedy algorithms (defined in each problem's alt.py
file) and those obtained from LLMs. All data is stored in a CSV format. Code for loading and analyzing this data in a pandas
DataFrame can be found in utils/analysis/data_aggregation.py
.
Ensure you are using Python version >= 3.10, and install the necessary packages by running pip install -r requirements.txt
.
To run evaluations using a Llama model, one must have access to the given model through Hugging Face, be logged in, and have the model downloaded (the page for Llama-3.1-70B-Instruct provides documentation for getting access and downloading the model). To run evaluations using a GPT model, one must have one's OpenAI API key set as the appropriate environment variable.
Given a problem (graph_coloring
, knapsack
, or traveling_salesman
), a model (gpt
or llama
), and a (sub-)dataset (random
or hard
), the corresponding experiment can be executed by running the following command in the command line:
python main.py configs/<problem>/<model>-<dataset>.jsonnet
For example, to run GPT-4o on the knapsack component of the hard dataset, one would run the following:
python main.py configs/knapsack/gpt-hard.jsonnet
When running such experiments, please note the following:
- To run Llama experiments, one must provide the path to the downloaded model in the appropriate section of each Llama config file.
- It is important to note that instances in the random and hard datasets follow the same naming scheme and that results for a given (model
$\times$ dataset) combination will be sent to a consistent filepath. Thus, one should not simultaneously run experiments for the random and hard components of the EHOP dataset using the same model. Instead, run them sequentially, and organize/rename the result file from the first experiment so that new results are not appended to it. Any simultaneous experiments that use distinct (model$\times$ dataset) combinations should produce results in distinct files and not produce any issues.
We also have support for Qwen 3 32B, both in thinking and non-thinking mode.
- Experiment Configuration Files: To run custom experiments, simply create your own
.jsonnet
file using the same format used in the provided examples. - Dataset Creation: To generate custom versions of the dataset, use the
generator.py
files in a given problem's directory to generate new problem instances.