* Icons created with the help of DALL-E.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Install all Python requirements listed in requirements.txt
(check here
to see how to install Pytorch on your system).
You can install the requirements.txt like:
pip install --upgrade pip
pip install -r requirements.txt
To generate data, you need files containing social scenarios.
An example starting scenario file is in CLCA/data/scenarios/germany_scenarios.txt
(coming soon).
To evaluate models with WVS data, please prepare the data using WorldValuesBench (commit: e5d959b0365b45fcae7a5a6668b5789612da336c).
Please put the extracted data of a culture into the corresponding folder CLCA/data/WorldValuesBench/{culture}
.
To extract data for a specific culture, please refer to the WVS dataset and filter the records by the corresponding country code.
After the data extraction, you should have a demographic_qa
and a value_qa
file, in addition, you should have a question_metadata
file and a codebook.json
.
The hyper-parameters and paths in this repository are manged using Hydra.
Please see example config files in llm_roleplaying/configs
.
Currently, we load base models from a storage instead of downloading it every time, please update the cache_dir
in inquirer/responder configs to your own storage location.
The model_inquirer
and model_responder
in the config folder contain configurations for two LLMs used in role-play (Participant 1 and Participant 2).
If two models are the same, only the model_inquirer
will be loaded to save memory.
The overall workflow of our work:
- Generate data (with judge and filtering,
generate_social_dialog.py
) - Generate intents (
intent_aug.py
) - Adaptation (
llm_adaptation.py
)
If you find this repository useful, please cite the following paper:
@article{liu2025clca,
author = {Chen Cecilia Liu and
Anna Korhonen and
Iryna Gurevych},
title = {Cultural Learning-Based Culture Adaptation of Language Models},
journal = {ArXiv preprint},
volume = {abs/2504.02953},
year = {2025},
url = {https://doi.org/10.48550/arXiv.2504.02953},
doi = {10.48550/ARXIV.2504.02953},
eprinttype = {arXiv},
eprint = {2504.02953},
}
First author: first_name.last_name AT tu-darmstadt.de
https://www.ukp.tu-darmstadt.de/
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions. This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
CLCA is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.
The role-playing code of this repository is modified upon the LLM-roleplay.