DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models (Bae. et al., 2025)
Our paper was accepted to NAACL 2025.
Full paper is available here: https://aclanthology.org/2025.naacl-long.624/
Abstract: While Large Language Models (LLMs) excel in zero-shot Question Answering (QA), they tend to expose biases in their internal knowledge when faced with socially sensitive questions, leading to a degradation in performance. Existing zero-shot methods are efficient but fail to consider context and prevent bias propagation in the answers.
To address this, we propose DeCAP, a method for debiasing LLMs using Context-Adaptive Prompt Generation. DeCAP leverages a Question Ambiguity Detection to take appropriate debiasing actions based on the context and a Neutral Answer Guidance Generation to suppress the LLMs make objective judgments about the context, minimizing the propagation of bias from their internal knowledge. Our various experiments across eight LLMs show that DeCAP achieves state-of-the-art zero-shot debiased QA performance. This demonstrates DeCAP's efficacy in enhancing the fairness and accuracy of LLMs in diverse QA settings.
- Our code runs on CUDA Version 12.7 with a GeForce RTX 3090 (24GB).
git clone https://github.com/BaeSuyoung/DeCAP.git
conda create -n decap python=3.9
conda activate decap
pip install -r requirements.txt
- This process generate (1) prefix instruction and (2) next answer guidance.
- The results are saved in the
dataset/bbq/ours-total.csv
anddataset/unqover/ours-total.csv
- You can also generate baseline prompts (by changing
exp_type
intobase
,retrieved
andrandom
, respectively).
cd model
bash prompt_generation.sh
- Ours (DeCAP)
## inference_gpu.sh
# bbq dataset
for exp_type in ours
do
for model in llama3_8B_instruct
do
for dataset_name in bbq unqover
do
command="CUDA_VISIBLE_DEVICES=0 python src/prompt_generation.py \
--experiment_type $exp_type \
--dataset_name $dataset_name \
--generation_model $model \
--sample_num 100 \
--batch_size 32 \
--seed 77"
echo $command
eval $command
done
done
done
# unqover dataset
for exp_type in ours
do
for model in llama3_8B_instruct
do
for dataset_name in unqover
do
command="CUDA_VISIBLE_DEVICES=0 python src/prompt_generation.py \
--experiment_type $exp_type \
--dataset_name $dataset_name \
--generation_model $model \
--sample_num 800 \
--batch_size 32 \
--seed 77"
echo $command
eval $command
done
done
done
- You can adaptively choose experiment type (
ext_type
), evaluation model (model
), and dataset name (dataset_name
) - This process iterates three times with different seeds (e.g., 77, 78, 79) and takes the average as the final result.
cd model
bash inference_gpu.sh
- Ours (DeCAP)
## prompt_generation.sh
# bbq dataset
for exp_type in ours
do
for model in flan_t5_11B flan_t5_11B llama2_7B llama2_7B_chat llama2_13B llama2_13B_chat llama3_8B llama3_8B_instruct
do
for dataset_name in bbq
do
command="CUDA_VISIBLE_DEVICES=0 python src/inference_gpu.py \
--experiment_type $exp_type \
--dataset_name $dataset_name \
--model_name $model \
--batch_size 32 \
--seed 77"
echo $command
eval $command
done
done
done
# unqover dataset
for exp_type in ours
do
for model in flan_t5_11B flan_t5_11B llama2_7B llama2_7B_chat llama2_13B llama2_13B_chat llama3_8B llama3_8B_instruct
do
for dataset_name in unqover
do
command="CUDA_VISIBLE_DEVICES=0 python src/inference_gpu.py \
--experiment_type $exp_type \
--dataset_name $dataset_name \
--model_name $model \
--batch_size 32 \
--seed 77"
echo $command
eval $command
done
done
done
- To run inference with other LLMs, you have to add model tags to the
MODEL_CARD
in prompt.py. - The results are saved in the
result/
folder.
@inproceedings{bae-etal-2025-decap,
title = "{D}e{CAP}: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models",
author = "Bae, Suyoung and
Choi, YunSeok and
Lee, Jee-Hyong",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.naacl-long.624/",
doi = "10.18653/v1/2025.naacl-long.624",
pages = "12555--12574",
ISBN = "979-8-89176-189-6",
abstract = "While Large Language Models (LLMs) excel in zero-shot Question Answering (QA), they tend to expose biases in their internal knowledge when faced with socially sensitive questions, leading to a degradation in performance. Existing zero-shot methods are efficient but failto consider context and prevent bias propagation in the answers. To address this, we propose *DeCAP*, a method for debiasing LLMs usingContext-Adaptive Prompt Generation. *DeCAP* leverages a *Question Ambiguity Detection* to take appropriate debiasing actions based on the context and a *Neutral Answer Guidance Generation* to suppress the LLMs make objective judgments about the context, minimizing thepropagation of bias from their internal knowledge. Our various experiments across eight LLMs show that *DeCAP* achieves state-of-the-art zero-shot debiased QA performance. This demonstrates *DeCAP*{'}s efficacy in enhancing the fairness and accuracy of LLMs in diverseQA settings."
}