This repository is created to support the survey paper Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives, by collecting and categorizing relevant research papers and datasets on value alignment in agentic AI systems.
We welcome contributions, discussions, and issues related to value alignment for agentic AI. If you have any questions, feel free to contact Zengwei_hnu@163.com. (We recommend cc'ing zhuhengshu@gmail.com as a precaution in case of any delivery issues.)
We will continue to update both the arXiv paper and this repository regularly. If you find our survey useful for your research, please cite the following paper:
@article{AgenticAIValueAlignment,
title={Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives},
author={Zeng, Wei and Zhu, Hengshu and Qin, Chuan and Wu, Han and Cheng, Yihang and Zhang, Sirui and Jin, Xiaowei and Shen, Yinuo and Wang, Zhenxing and Zhong, Feimin and Xiong, Hui},
journal={arXiv preprint arXiv:2506.09656},
year={2025}
}
- Overview of Our Survey
- Related Survey
- The Principles of Values Alignment
- Agent System Application
- Values Alignment Evaluation for Agent Systems
- Methodologies for Agent Value Alignment
- Datasets
- Future Directions
Time | Title | Keywords | Venue |
---|---|---|---|
2025 | The rise and potential of large language model based agents: a survey | Communication structures, practical applications and societal systems etc. of LLM-based agents | Science China Information Sciences |
2025 | A Survey on Alignment for Large Language Model Agents | Value alignment objectives, datasets, techniques, and evaluation methods for LLM-based agents | Openreview |
2025 | Multi-Agent Collaboration Mechanisms: A Survey of LLMs | Conceptual framework, interaction mechanisms, and application overview of LLM-based agent systems | arXiv |
2024 | Large Language Model based Multi-Agents: A Survey of Progress and Challenges | Capabilities, framework Analysis, and ppplication overview of LLM-based multi-agent systems | arXiv |
2024 | A survey on large language model based autonomous agents | Constituent modules, application overview, and evaluation methods of LLM-based autonomous agents: | Frontiers of Computer Science |
2023 | AI Alignment: A Comprehensive Survey | Motivations and objectives, alignment methods, and assurance and governance of AI alignment | arXiv |
2023 | From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models | Definition and evaluation of LLM alignment objectives | arXiv |
2023 | Large Language Model Alignment: A Survey | Definition, categories, testing, and evaluation of LLM alignment | arXiv |
2023 | Unpacking the Ethical Value Alignment in Big Models | Definition, normative principles, and technical methods of LLM value alignment | arXiv |
2023 | Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment | Alignment objectives for trustworthy LLMs | arXiv |
2024 | Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions | Challenges, fundamental definitions, and alignment frameworks of LLM value alignment | arXiv |
2025 | Value alignment in ai large models: Current status, key issues, and normative strategies | Necessity, conceptual definitions, theoretical approaches, challenges and future outlook of LLM value alignment | CNKI |
Sub-Level | Title | Time | Venue |
---|---|---|---|
Recruiment | Recruitment in the times of machine learning | 2019 | Management Systems in Production Engineering |
HR analytics and ethics | 2019 | IBM Journal of Research andDevelopment | |
Ethics of ai-enabled re-cruiting and selection: A review and research agenda | 2022 | Journal ofBusiness Ethics | |
Legal Consultation | Lawluo: A chinese law firm co-run by llm agents | 2024 | arXiv |
Pharmaceutical Company Governance | Operationalising ai governance through ethics-based auditing: an industry case study | 2023 | AI andEthics |
Time | Dataset | Paper | Keywords | Level | Venue |
---|---|---|---|---|---|
2025 | DEFSurveySim | Towards realistic evaluation of cultural value alignment in large language models: Diversity enhancement for survey response simulation | Nation, Culture | Macro Level, Meso Level | Information Processing & Management |
2025 | NaVAB | Benchmarking Multi-National Value Alignment for Large Language Models | Nation | Meso Level | arXiv preprint arXiv:2504.12911 |
2025 | German Credit Data | EARN Fairness: Explaining, Asking, Reviewing, and Negotiating Artificial Intelligence Fairness Metrics Among Stakeholders | Company governance | Micro Level | Proceedings of the ACM on Human-Computer Interaction} |
2024 | CultureSPA | Self-Pluralising Culture Alignment for Large Language Models | Nation, Culture | Macro Level, Meso Level | arXiv preprint arXiv:2410.12971 |
2024 | DailyDilemmas | DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life | Harmlessness, Responsibility, Justice & Fairness, Harmlessness, Virtue | Macro Level | arXiv preprint arXiv:2410.02683 |
2024 | HofstedeCulturalDimensions | How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions | Culture | Macro Level | arXiv preprint arXiv:2406.14805 |
2024 | IndieValueCatalog | Can Language Models Reason about Individualistic Human Values and Preferences? | Justice & Fairness | Macro Level | arXiv preprint arXiv:2410.03868 |
2024 | KorNAT | KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge | Nation, Culture,Justice & Fairness | Macro Level, Meso Level | arXiv preprint arXiv:2402.13605 |
2024 | LLMGlobe | LLM-GLOBE: A Benchmark Evaluating the Cultural Values Embedded in LLM Output | Harmlessness, Justice & Fairness, Privacy, Beneficence, Responsibility | Macro Level | arXiv preprint arXiv:2411.06032 |
2024 | LaWGPT | Lawyer GPT: A Legal Large Language Model with Enhanced Domain Knowledge and Reasoning Capabilities | Fairness in legal | Macro Level, Micro Level | Proceedings of the 2024 3rd International Symposium on Robotics, Artificial Intelligence and Information Engineering |
2024 | Moral Beliefs | Evaluating Moral Beliefs across LLMs through a Pluralistic Framework | Nation, Culture,Justice & Fairness, Solidarity, Sustainability, Transparency | Macro Level, Meso Level | arXiv preprint arXiv:2411.03665 |
2024 | Moral Stories | Measuring Human-AI Value Alignment in Large Language Models | Harmlessness, Justice & Fairness, Responsibility, Beneficence, Dignity, Virtue, Freedom & Autonomy | Macro Level | Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society |
2024 | PkuSafeRLHF | Pku-saferlhf: Towards multi-level safety alignment for llms with human preference | Harmlessness, Freedom & Autonomy, Justice & Fairness,Turst, Privacy, Responsibility, Beneficence | Macro Level, Meso Level | arXiv preprint arXiv:2406.15513 |
2024 | ProgressGym | ProgressGym: Alignment with a Millennium of Moral Progress | Harmlessness, Freedom & Autonomy, Turst, Dignity, Beneficence | Macro Level | Advances in Neural Information Processing Systems |
2024 | SafeSora | SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset | Harmlessness,Usefulness,Responsibility | Macro Level | Advances in Neural Information Processing Systems |
2023 | MFQ(Moral Foundations Questionnaire) | Moral Foundations of Large Language Models | Turst, Responsibility | Macro Level | |
2023 | BeaverTails | Beavertails: Towards improved safety alignment of llm via a human-preference dataset | Harmlessness, Justice & Fairness, Privacy, Beneficence, Responsibility | Macro Level | Advances in Neural Information Processing Systems |
2023 | CBBQ(Chinese Bias Benchmark Dataset) | CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models | China(Safeguarding national security and adhering to the core socialist values) | Meso Level | arXiv preprint arXiv:2306.16244 |
2023 | CDEval | CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models | Cultural, Eduaction, Individualism | Macro Level, Meso Level | arXiv preprint arXiv:2311.16421 |
2023 | CORGI-PM | CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation | Justice & Fairness | Macro Level | arXiv preprint arXiv:2301.00395 |
2023 | Cvalues | Cvalues: Measuring the values of chinese large language models from safety to responsibility. | Harmlessness, Responsibility | Macro Level, Meso Level | arXiv preprint arXiv:2307.09705 |
2023 | DecodingTrust | DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models | Privacy, Justice & Fairness, Harmlessness | Macro Level | Cited on |
2023 | EEC(Equity Evaluation Corpus) | Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems | Harmlessness | Macro Level | arXiv preprint arXiv:2311.04892 |
2023 | Flames | Flames: Benchmarking Value Alignment of LLMs in Chinese | Justice & Fairness, Responsibility, Harmlessness, Privacy | Macro Level | arXiv preprint arXiv:2311.06899 |
2023 | GlobalOpinionQA | Towards Measuring the Representation of Subjective Global Opinions in Language Models | Nation, Culture | Macro Level, Meso Level | arXiv preprint arXiv:2306.16388 |
2023 | Persona Bias | Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs | Dignity | Macro Level | arXiv preprint arXiv:2311.04892 |
2023 | Social Chemistry 101 | TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models | Justice & Fairness,Harmlessness, Responsibility, Dignity, Beneficence | Macro Level | arXiv preprint arXiv:2306.11507 |
2023 | ToxiGen | An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models | Justice & Fairness | Macro Level | arXiv preprint arXiv:2301.09211 |
2022 | CDial-Bias | Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks | Virtue | Macro Level | arXiv preprint arXiv:2202.08011 |
2022 | Moral Integrity Corpus | The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems | Justice & Fairness, Responsibility, Beneficence, Dignity, Virtue | Macro Level | arXiv preprint arXiv:2204.03021 |
2022 | MoralExceptQA | When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment | Solidarity, Harmlessness, Responsibility, Beneficence, Dignity | Macro Level | Advances in neural information processing systems |
2022 | ValueNet | Valuenet: A new dataset for human value driven dialogue system. | Freedom & Autonomy,Beneficence, Harmlessness, Dignity, Freedom & Autonomy | Macro Level | Proceedings of the AAAI Conference on Artificial Intelligence |
2021 | BBQ(Bias Benchmark for QA) | BBQ: A Hand-Built Bias Benchmark for Question Answering | Justice & Fairness | Macro Level | arXiv preprint arXiv:2110.08193 |
2021 | BOLD | BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation | Justice & Fairness | Macro Level | Proceedings of the 2021 ACM conference on fairness, accountability, and transparency |
2021 | Scruples | Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes | Beneficence | Macro Level | Proceedings of the AAAI Conference on Artificial Intelligence |
2020 | CrowS-Paris | CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models | Justice & Fairness | Macro Level | arXiv preprint arXiv:2010.00133 |
2020 | ETHICS | Aligning AI With Shared Human Values | Justice & Fairness, Responsibility, Beneficence, Dignity, Usefulness | arXiv preprint arXiv:2008.02275 | |
2020 | StereoSet | StereoSet: Measuring stereotypical bias in pretrained language models | Justice & Fairness | Macro Level | arXiv preprint arXiv:2004.09456 |
2020 | UnQover | UnQovering Stereotyping Biases via Underspecified Questions | Justice & Fairness | Macro Level | arXiv preprint arXiv:2010.02428 |
2019 | Social Bias Frames | Social Bias Frames: Reasoning about Social and Power Implications of Language | Freedom & Autonomy | Macro Level | arXiv preprint arXiv:1911.03891 |
2019 | WikiGenderBias | Towards Understanding Gender Bias in Relation Extraction | Dignity | Macro Level | arXiv preprint arXiv:1911.03642 |
2018 | WinoBias | Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods | Solidarity | Macro Level | arXiv preprint arXiv:1804.06876 |
2018 | WinoGender | Gender Bias in Coreference Resolution | Justice & Fairness | Macro Level | arXiv preprint arXiv:1804.09301 |