To set up:
pip install -r requirements.txt
To run a metric, try
python src/main_batch.py --model=Qwen/Qwen3-0.6B --metric=Reliance --data-hf=GSM8K
or
./test.sh
Output is printed to the console, and to log/jsonl files in log/
.
Note: All supported Huggingface datasets and CoT models are listed in src/config.py
, feel free to add to the lists.
We also support local datasets in data/
, such as alpaca_500_samples.json
(based on Alpaca).
To generate graphs,
python src/plot_metric_logprobs.py --metric-name Transferability --input-path log/input.jsonl --out-dir output
- Github Naming
- branches:
- development: dev/<descriptive-name>
- test: test/<descriptive-name>