This notebook demonstrates how to use MAPIE for conformal predictions with a Large Language Model (LLM). The goal is to evaluate prediction sets for a multiple-choice question-answering task using conformal prediction techniques. This notebook is based on the work presented in Benchmarking LLMs via Uncertainty Quantification. Parts of the code come from this Github repo.
- Dataset: The CosmosQA dataset, a benchmark for commonsense reasoning.
- LLM: The notebook utilizes the
Mistral-7B-Instruct-v0.3
model for predictions. - MAPIE for Conformal Prediction: The
SplitConformalClassifier
from MAPIE is used to generate prediction sets with a given confidence level.
-
Setup & Installation
- Clone the repository and install required dependencies.
- Authenticate with Hugging Face Hub to access the LLM.
-
Data Preprocessing
- Load and transform CosmosQA data into a format suitable for the model.
-
Model Loading
- Load the
Mistral-7B
model and its tokenizer. - Define an
LLMClassifier
wrapper to make predictions in a structured format.
- Load the
-
Conformal Prediction with MAPIE
- Use
SplitConformalClassifier
to conformalize the model on a subset of the data. - Generate prediction sets with a 95% confidence level.
- Use
-
Evaluation & Visualization
- Compute accuracy scores and coverage metrics.
- Visualize the size distribution of prediction sets.
- Plot accuracy per prediction set size.
- The LLM achieves an accuracy of approximately 86% on the test set.
- Prediction sets provide calibrated uncertainty estimates, enhancing reliability in decision-making.
- The more uncertain the model is (i.e., the larger the prediction sets), the lower the accuracy.
This notebook illustrates how conformal prediction techniques can be applied to LLMs for more trustworthy AI systems. The approach can be extended to other question-answering datasets and models to assess confidence in model predictions.