Cultural Learning-Based Culture Adaptation of Language Models

* Icons created with the help of DALL-E.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Setup

Install all Python requirements listed in requirements.txt (check here to see how to install Pytorch on your system).

You can install the requirements.txt like:

pip install --upgrade pip
pip install -r requirements.txt

Usage

To generate data, you need files containing social scenarios. An example starting scenario file is in CLCA/data/scenarios/germany_scenarios.txt (coming soon).

To evaluate models with WVS data, please prepare the data using WorldValuesBench (commit: e5d959b0365b45fcae7a5a6668b5789612da336c). Please put the extracted data of a culture into the corresponding folder CLCA/data/WorldValuesBench/{culture}. To extract data for a specific culture, please refer to the WVS dataset and filter the records by the corresponding country code. After the data extraction, you should have a demographic_qa and a value_qa file, in addition, you should have a question_metadata file and a codebook.json.

The hyper-parameters and paths in this repository are manged using Hydra. Please see example config files in llm_roleplaying/configs. Currently, we load base models from a storage instead of downloading it every time, please update the cache_dir in inquirer/responder configs to your own storage location.

The model_inquirer and model_responder in the config folder contain configurations for two LLMs used in role-play (Participant 1 and Participant 2).

If two models are the same, only the model_inquirer will be loaded to save memory.

The overall workflow of our work:

Generate data (with judge and filtering, generate_social_dialog.py)
Generate intents (intent_aug.py)
Adaptation (llm_adaptation.py)

Citation

If you find this repository useful, please cite the following paper:

@article{liu2025clca,
  author       = {Chen Cecilia Liu and
                  Anna Korhonen and
                  Iryna Gurevych},
  title        = {Cultural Learning-Based Culture Adaptation of Language Models},
  journal      = {ArXiv preprint},
  volume       = {abs/2504.02953},
  year         = {2025},
  url          = {https://doi.org/10.48550/arXiv.2504.02953},
  doi          = {10.48550/ARXIV.2504.02953},
  eprinttype    = {arXiv},
  eprint       = {2504.02953},
}

Contact

First author: first_name.last_name AT tu-darmstadt.de

https://www.ukp.tu-darmstadt.de/

https://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions. This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

License

CLCA is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Acknowledgement

The role-playing code of this repository is modified upon the LLM-roleplay.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CLCA		CLCA
static		static
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cultural Learning-Based Culture Adaptation of Language Models

Setup

Usage

Citation

Contact

License

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

UKPLab/arxiv2025-clca

Folders and files

Latest commit

History

Repository files navigation

Cultural Learning-Based Culture Adaptation of Language Models

Setup

Usage

Citation

Contact

License

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages