Title: UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering [KDD 2025 Accepted (Oral) Paper]
The paper link: UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering.
The source data is available in HuggingFace and Kaggle.
The UQABench is a benchmark dataset for evaluating user embeddings in prompting LLMs for personalized question answering. The standardized evaluation process includes pre-training, fine-tuning, and evaluating stages. We provide the requirements and quick-start scripts for each stage.
The source data are user interactions collected and processed from Taobao. Following previous work, we randomly split the data into 9:1 as the training and test sets. The dataset statistics are summarized as follows:
Data Split | Total | #Training | #Test |
---|---|---|---|
Interaction | 31,317,087 | 28,094,799 | 3,222,288 |
Specifically, the training set serves in the pre-training and fine-tuning (aligning) stages. Then, we design task-specific question prompts based on the test set. We refine the questions, filter out low-quality questions, and eventually get 7,192 personalized Q&A for the evaluating stage.
- Download data from HuggingFace or Kaggle.
- Download
Qwen/Qwen2.5-3B-Instruct
from Huggingface.
- pytorch 2.4
- fbgemm_gpu
- transformers
- causal_conv1d==1.4.0
- mamba_ssm==2.2.3
bash scripts/pretrain_trm_plus.sh
bash scripts/align_trm_plus.sh
bash scripts/generate_trm_plus.sh
python calc_metrics_acc.py generated/trm_plus_align_frozen.jsonl
Please cite our paper if you use our dataset.
@inproceedings{liu2025uqabench,
title={UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering},
author={Liu, Langming and Liu, Shilei and Yuan, Yujin and Zhang, Yizhen and Yan, Bencheng and Zeng, Zhiyuan and Wang, Zihao and Liu, Jiaqi and Wang, Di and Su, Wenbo and others},
booktitle={Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2},
pages={5652--5661},
year={2025}
}