GitHub - sandylaker/ib-edl: Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)

Official implementation of Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)

Installation

Install "setuptools": pip install setuptools.
Git clone this repository.
Navigate to the root level of the repository (where setup.cfg is located) and run pip install -e . (Note: Don't forget the dot . at the end of the command).
(Optional) Run huggingface-cli login to log in to HuggingFace-Hub, and use wandb login to log in to WandB.
(Optional) Go through the Docs of mmengine.Config to know how to use the Config.

IB-EDL Training

Assume you want to fine-tune Llama3-8B on the OBQA dataset using IB-EDL. Run the following command:

python tools/evidential_ft.py configs/obqa_llama3_8b/ib_obqa_llama3_8b.yaml \
  -w workdirs/ib_edl/obqa/ \
  -n ib_obqa_llama3_8b

To run the training with a different IB regularization strength, you can use -o vib.beta=NEW-VALUE in the training command.

Since the configuration file contains the following entry:

process_preds:
  npz_file: "obqa.npz"

The training program will save the predictions as a file named obqa.npz.

OOD Detection

After completing the fine-tuning using the command above, you should have:

A LoRA checkpoint of Llama3-8B trained on OBQA.
The obqa.npz file, which contains predictions on the OBQA dataset (assumed to be the in-distribution (ID) dataset for OOD detection).

The OOD detection can be done as follows:

Step 1: Obtain predictions on the OOD Dataset: Assume that the CSQA dataset is the OOD dataset. You can generate predictions on this dataset using the checkpoint trained on OBQA:

python tools/evidential_ft.py configs/ood_llama3_8b/ib_obqa_csqa_llama3_8b.yaml \
  -w workdirs/ib_edl/csqa/ \
  -s \
  -o model.peft_path=workdirs/ib_edl/obqa/checkpoint-XXX

This command will evaluate the model on CSQA and store the predictions in a file named csqa.npz.

Step 2: Run OOD detection script as follows:

python tools/ood.py workdirs/ib_edl/obqa/obqa.npz workdirs/ib_edl/csqa/csqa.npz

Post-hoc Calibration Technique

In the Appendix of the paper, we introduced a post-hoc calibration technique to further enhance the performance of IB-EDL. To use this technique, follow these steps:

Step 1: Visualize the calibration curve: Open a Jupyter Notebook, load the predictions using numpy, and use ib_edl.plot_calibration_curve_and_ece to visualize the calibration curve.

Step 2: Set the sigma multiplier value: Based on the calibration curve, choose an appropriate value for the sigma multiplier in the post-hoc calibration technique. Then, re-run the inference on the validation set with the chosen sigma multiplier:

python tools/evidential_ft.py path/to/config.yaml \
  -s \
  -o model.peft=path/to/model/checkpoint-XXX vib.sigma_mult=NEW-VALUE

Or you can modify the configuration file directly by updating the following entry:

vib:
  sigma_mult: NEW-VALUE

It is recommended to repeat this process on the validation set to determine the best hyperparameter value for vib.sigma_mult.

Adaptation to Generative Tasks

Currently, IB-EDL is implemented only for multiple-choice QA tasks, which follow a classification setting. To extend it for open-ended generation tasks, some adaptations to the implementation are required. Contributions are welcome—feel free to submit a pull request!

Citation

@inproceedings{
    li2025calibrating,
    title={Calibrating {LLM}s with Information-Theoretic Evidential Deep Learning},
    author={Yawei Li and David R{\"u}gamer and Bernd Bischl and Mina Rezaei},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=YcML3rJl0N}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
ib_edl		ib_edl
tests		tests
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Official implementation of Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)

Installation

IB-EDL Training

OOD Detection

Post-hoc Calibration Technique

Adaptation to Generative Tasks

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

sandylaker/ib-edl

Folders and files

Latest commit

History

Repository files navigation

Official implementation of Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)

Installation

IB-EDL Training

OOD Detection

Post-hoc Calibration Technique

Adaptation to Generative Tasks

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages