Skip to content

This repository is created to support the survey paper Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives, by collecting and categorizing relevant research papers and datasets on value alignment in agentic AI systems.

Notifications You must be signed in to change notification settings

Wei-ZENG1020/Value-Alignment-Agentic-AI-Papers-Survey-Taxonomy

Repository files navigation

Value-Alignment-Agentic-AI-Papers-Survey-Taxonomy

This repository is created to support the survey paper Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives, by collecting and categorizing relevant research papers and datasets on value alignment in agentic AI systems.

We welcome contributions, discussions, and issues related to value alignment for agentic AI. If you have any questions, feel free to contact Zengwei_hnu@163.com. (We recommend cc'ing zhuhengshu@gmail.com as a precaution in case of any delivery issues.)

We will continue to update both the arXiv paper and this repository regularly. If you find our survey useful for your research, please cite the following paper:

@article{AgenticAIValueAlignment,
  title={Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives},
  author={Zeng, Wei and Zhu, Hengshu and Qin, Chuan and Wu, Han and Cheng, Yihang and Zhang, Sirui and Jin, Xiaowei and Shen, Yinuo and Wang, Zhenxing and Zhong, Feimin and Xiong, Hui},
  journal={arXiv preprint arXiv:2506.09656},
  year={2025}
}

Bookmarks

Papers

Overview of Our Survey

Related Survey

Time Title Keywords Venue
2025 The rise and potential of large language model based agents: a survey Communication structures, practical applications and societal systems etc. of LLM-based agents Science China Information Sciences
2025 A Survey on Alignment for Large Language Model Agents Value alignment objectives, datasets, techniques, and evaluation methods for LLM-based agents Openreview
2025 Multi-Agent Collaboration Mechanisms: A Survey of LLMs Conceptual framework, interaction mechanisms, and application overview of LLM-based agent systems arXiv
2024 Large Language Model based Multi-Agents: A Survey of Progress and Challenges Capabilities, framework Analysis, and ppplication overview of LLM-based multi-agent systems arXiv
2024 A survey on large language model based autonomous agents Constituent modules, application overview, and evaluation methods of LLM-based autonomous agents: Frontiers of Computer Science
2023 AI Alignment: A Comprehensive Survey Motivations and objectives, alignment methods, and assurance and governance of AI alignment arXiv
2023 From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models Definition and evaluation of LLM alignment objectives arXiv
2023 Large Language Model Alignment: A Survey Definition, categories, testing, and evaluation of LLM alignment arXiv
2023 Unpacking the Ethical Value Alignment in Big Models Definition, normative principles, and technical methods of LLM value alignment arXiv
2023 Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment Alignment objectives for trustworthy LLMs arXiv
2024 Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions Challenges, fundamental definitions, and alignment frameworks of LLM value alignment arXiv
2025 Value alignment in ai large models: Current status, key issues, and normative strategies Necessity, conceptual definitions, theoretical approaches, challenges and future outlook of LLM value alignment CNKI

The Principles of Values Alignment

The Macro Level Principles of Values Alignment

Sub-Level Sub-sub-Level Title Time Venue
Moral Foundation Beneficience The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
A unified framework of five principles
for ai in society
2022 Machine learning and the city: Applications inarchitecture and urban design
Justice & Fairness The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
Ethics of ai: A
systematic literature review of principles and challenges
2022 Proceedings of the 26th international conference on evaluation andassessment in software engineering
A unified framework of five principles
for ai in society
2022 Machine learning and the city: Applications inarchitecture and urban design
Towards realistic evaluation of cultural value alignment in large
language models: Diversity enhancement for survey response
simulation
2025 Information Processing & Management
Achieving fairness in multi-agent
mdp using reinforcement learning
2023 The Twelfth InternationalConference on Learning Representations
Beavertails: Towards improved safety
alignment of llm via a human-preference dataset
2023 Advances inNeural Information Processing Systems
Honesty From instructions
to intrinsic human values - A survey of alignment goals for big
models
2023 arXiv
Improving factuality and reasoning in language models through
multiagent debate
2023 Forty-first International Conference on Machine Learning
Responsibility The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
Ethics of ai: A
systematic literature review of principles and challenges
2022 Proceedings of the 26th international conference on evaluation andassessment in software engineering
Virtue Dailydilemmas: Revealing value
preferences of llms with quandaries of daily life
2025 The ThirteenthInternational Conference on Learning Representations, ICLR 2025,Singapore
Value compass leaderboard: A platform
for fundamental and validated evaluation of llms values
2025 arXiv
Dignity The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
Shaping the ethical governance path ofartificial intelligence in
the chinese context—based on value-instrument rationality
2025 Studies in Science of Science
Rights Protection Freedom & Autonomy The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
A unified framework of five principles
for ai in society
2022 Machine learning and the city: Applications inarchitecture and urban design
Shaping the ethical governance path ofartificial intelligence in
the chinese context—based on value-instrument rationality
2025 Studies in Science of Science
Privacy The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
Ethics of ai: A
systematic literature review of principles and challenges
2022 Proceedings of the 26th international conference on evaluation andassessment in software engineering
Hydragan: A cooperative agent
model for multi-objective data generation
2024 ACM Trans. Intell.Syst. Technol.
Empowering users in digital
privacy management through interactive llm-based agents
2025 TheThirteenth International Conference on Learning Representations, ICLR2025, Singapore
Sustainability Solidarity The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
Sustainability The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
Cooperate or collapse: Emergence of
sustainable cooperation in a society of LLM agents
2024 Advances in Neural Information Processing Systems 38: AnnualConference on Neural Information Processing Systems 2024,NeurIPS 2024
System Governance Harmlessness The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
A unified framework of five principles
for ai in society
2022 Machine learning and the city: Applications inarchitecture and urban design
From instructions
to intrinsic human values - A survey of alignment goals for big
models
2023 arXiv
Training socially aligned language models on simulated
social interactions
2024 The Twelfth International Conferenceon Learning Representations, ICLR 2024
Safesora: Towards safety alignment of text2video generation
via a human preference dataset
2024 Advances in Neural InformationProcessing Systems
Trust The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
Towards realistic evaluation of cultural value alignment in large
language models: Diversity enhancement for survey response
simulation
2025 Information Processing & Management
Transparency The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
A unified framework of five principles
for ai in society
2022 Machine learning and the city: Applications inarchitecture and urban design
Ethics of ai: A
systematic literature review of principles and challenges
2022 Proceedings of the 26th international conference on evaluation andassessment in software engineering
ICA-CRMAS: intelligent
context-awareness approach for citation recommendation based
on multi-agent system
2024 ACM Trans. Manag. Inf. Syst.
A multimodal
automated interpretability agent
2024 Forty-first InternationalConference on Machine Learning, ICML 2024
Biodiscoveryagent: An
ai agent for designing genetic perturbation experiments
2024 arXiv
Usefulness From instructions
to intrinsic human values - A survey of alignment goals for big
models
2023 arXiv
Dailydilemmas: Revealing value
preferences of llms with quandaries of daily life
2025 The ThirteenthInternational Conference on Learning Representations, ICLR 2025

The Meso Level Principles of Values Alignment

Sub-Level Sub-sub-level Title Time Venue
National Level The United States Unpacking the ethical value
alignment in big models
2022 arXiv
The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
Blueprint for an ai bill of rights: Making automated systems
work for the american people
2022 White House Nimble Books
The national
artificial intelligence research and development strategic plan: 2023
update
2023 National Science and Technology Council (US)
European Union Ethics
guidelines for trustworthy AI
2019 High-level expert group on artificial intelligence
EU Artificial intelligence act 2024 Regulamento da União Europeia (UE)
Unpacking the ethical value
alignment in big models
2022 arXiv
The global landscape of ai
ethics guidelines
2019 Nature machine intelligence
China Measures for the ethical review of science and
technology
2023 C. A. of Sciences
Standardization of ai ethical
governance
2023 National Artificial Intelligence StandardizationGeneral Group
Unpacking the ethical value
alignment in big models
2022 arXiv
Industry Level Education Guidance for generative AI in education
and research
2023 UNESCO Publishing
Health Ethics and governance of artificial
intelligence for health: Guidance on large multi-modal models
2024 W. H. Organization
Gaming The ethics of ai in games 2023 IEEE Transactions onAffective Computing
Cultural Level Large/Small Power Distance
Strong/Weak Uncertainty Avoidance
Individual/Collectivism
Dimensionalizing cultures: The hofstede model in
context
2011 Online readings in psychology and culture
Cdeval: A benchmark for measuring the cultural dimensions of
large language models
2023 arXiv
How well do
llms represent values across cultures? empirical analysis of llm
responses based on hofstede cultural dimensions
2024 arXiv
Llm-globe: A benchmark evaluating the cultural values
embedded in llm output
2024 arXiv
Long/Short-Term Orientation
Masculinity/Femininity
Dimensionalizing cultures: The hofstede model in
context
2011 Online readings in psychology and culture
Cdeval: A benchmark for measuring the cultural dimensions of
large language models
2023 arXiv
How well do
llms represent values across cultures? empirical analysis of llm
responses based on hofstede cultural dimensions
2024 arXiv
Indulgence/Restrained Dimensionalizing cultures: The hofstede model in
context
2011 Online readings in psychology and culture
Cdeval: A benchmark for measuring the cultural dimensions of
large language models
2023 arXiv

The Micro Level Principles of Values Alignment

Sub-Level Title Time Venue
Recruiment Recruitment in the times of machine learning 2019 Management Systems in Production Engineering
HR analytics and ethics 2019 IBM Journal of Research andDevelopment
Ethics of ai-enabled re-cruiting and selection: A review and research agenda 2022 Journal ofBusiness Ethics
Legal Consultation Lawluo: A chinese law firm co-run by llm agents 2024 arXiv
Pharmaceutical Company Governance Operationalising ai governance through ethics-based auditing: an industry case study 2023 AI andEthics

Agent System Application

High Generaliability

Generalizability Category Title Time Venue
High Tool Using Multi-modal agent tuning: Building avlm-driven agent for efficient tool usage 2025 ICLR 2025
Agent smith: A single image can jailbreak one million multimodal LLM agents exponentially fast 2024 ICML 2024
GTA: A benchmark for general tool agents 2024 Advances in Neural Information Processing Systems
Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents 2024 ACL 2024
Spider2-v: How far are multimodal agents from automating data science and engineering workflows? 2024 Advances in Neural Information Processing Systems
Dialugue
Agents
Please donate to save a life: Inducing politeness to handle resistance in persuasive dialogue agents 2024 IEEE ACM Trans. Audio Speech
Lang. Process
Archer: Training language model agents via hierarchical multi-turn RL 2024 ICML 2024
Inside out: Emotional multiagent multimodal dialogue systems 2024 IJCAI 2024
KEEP chatting! an attractive dataset for continuous conversation agents 2024 ACL 2024
Secom: On memory construction and retrieval for personalized conversational agents 2025 ICLR 2025
STYLE: improving domain transferability of asking clarification questions in large language model powered conversationalagents 2024 ACL 2024
Talk with human-like agents: Empathetic dialogue through perceptible acoustic reception and reaction 2024 ACL 2024
AGILE: A novel reinforcement learning framework of LLM agents 2024 NeurIPS 2024
Intelligent agents with llm-based process automation 2024 KDD 2024
Incharacter: Evaluating personality fidelity in role-playing agents through psychological interviews 2024 ACL 2024
Plug and-play policy planner for large language model powered dialogue agents 2024 ICLR 2024
Probing the uniquely identifiable linguistic patterns of conversational AI agents 2024 ACL 2024
Socialbench: Sociality evaluation of role-playing conversational agents 2024 ACL 2024
Speaker verification in agent-generated conversations 2024 ACL 2024
Multimodal Perception MLLM as retriever: Interactively learning multimodal retrieval for embodied agents 2025 ICLR 2025
Multi-modal agent tuning: Building a vlm-driven agent for efficient tool usage 2025 ICLR 2025
Agent smith: A single image can jailbreak one million multimodal LLM agents exponentially fast 2024 ICML 2024
Doraemongpt: Toward understanding dynamic scenes with large language models (exemplified as A video agent) 2024 ICML 2024
Multi-Agent Collaboration Adasociety: An adaptive environment with social structures for multi-agent decision-making 2024 NeurIPS 2024
Exploring collaboration mechanisms for LLM agents: A social psychology view 2024 ACL 2024
Maximum entropy heterogeneous-agent reinforcement learning 2024 ICLR 2024
Fuzzy feedback multiagent reinforcement learning for adversarial dynamic multiteam competitions 2024 IEEE Trans. Fuzzy Syst.
Entity divider with language grounding in multi-agent reinforcement learning 2023 ICML 2023
Collaborative dynamic scheduling in a self-organizing manufacturing system using multi-agent reinforcement learning 2024 Adv. Eng. Informatics
Experiential co-learning of software-developing agents 2024 ACL 2024
Swe-agent: Agent-computer interfaces enable automated software engineering 2024 NeurIPS 2024
Learning to cooperate with humans using generative agents 2024 NeurIPS 2024
Long-horizon planning for multi-agent robots in partially observable environments 2024 NeurIPS 2024
360°rea: Towards A reusable experience accumulation with 360° assessment for multi-agent system 2024 ACL 2024
Visual Tasks Processing Cobjeason: Reasoning covered object in image by multi-agent collaboration based on informed knowledge graph 2024 ACM Trans. Knowl. Discov. Data
Genartist: Multimodal LLM as an agent for unified image generation and editing 2024 NeurIPS 2024
Restoreagent: Autonomous image restoration agent via multimodal large language models 2024 NeurIPS 2024
Scenecraft: An LLM agent for synthesizing 3d scenes as blender code 2024 ICML 2024
Natural Language Processing Kgagent: Learning a deep reinforced agent for keyphrase generation 2024 IEEE ACM Trans. Audio Speech Lang. Process.
A human inspired reading agent with gist memory of very long contexts 2024 ICML 2024
Text2db: Integration aware information extraction with large language model agents 2024 ACL 2024
Optimizing text-to-sql conversion techniques through the integration of intelligent agents and large language models 2025 Inf. Process. Manag.
Xmc-agent : Dynamic navigation over scalable hierarchical index for incremental extreme multi-label classification 2024 ACL 2024
A two-agent game for zero-shot relation triplet extraction 2024 ACL 2024
Data Generation and Analysis Transagent: Transfer vision-language foundation models with heterogeneous agent collaboration 2024 NeurIPS 2024
Reasoning, Planning
and Decision Optimization
AVIS: autonomous visual information seeking with large language model agent 2023 NeurIPS 2023
Sociodojo: Building lifelong analytical agents with real-world text and time series 2024 ICLR 2024
Agent instructs large language models to be general zero-shot reasoners 2024 ICML 2024
Agent planning with world knowledge model 2024 NeurIPS 2024
Automanual: Constructing instruction manuals by LLM agents via interactive environmental learning 2024 NeurIPS 2024
Improving factuality and reasoning in language models through multiagent debate 2024 ICML 2024
Magdi: Structured distillation of multi-agent interaction graphs improves reasoning in smaller language models 2024 ICML 2024
Social learning through interactions with other agents: A survey 2024 IJCAI 2024
Online continual learning for interactive instruction following agents 2024 ICLR 2024
Travelplanner: A benchmark for real-world planning with language agents 2024 ICML 2024
Trial and error: Exploration-based trajectory optimization of LLM agents 2024 ACL 2024
Distilling internet-scale vision-language models into embodied agents 2023 ICML 2023
Grounded decoding: Guiding text generation with grounded models for embodied agents 2023 NeurIPS 2023
Robotouille: An asynchronous planning benchmark for LLM agents 2025 ICLR 2025
Language agents with reinforcement learning for strategic play in the werewolf game 2024 ICML 2024
Omnijarvis: Unified vision language-action tokenization enables open-world instruction following agents 2024 NeurIPS 2024
Understanding the weakness of large language model agents within a complex android environment 2024 KDD 2024
Embodied agent interface: Benchmarking llms for embodied decision making 2024 NeurIPS 2024
Tool learning in the wild: Empowering language models as automatic tool agents 2025 WWW 2025
Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks 2023 NeurIPS 2023
On the effects of data scale on UI control agents 2024 NeurIPS 2024
Rada: Retrieval-augmented web agent planning with llms 2024 ACL 2024
Division-of-thoughts: Harnessing hybrid language model synergy for efficient on-device agents 2025 WWW 2025

Limited Generaliability

Generalizability Category Title Time Venue
Limited Function- driven Code Generation Mapcoder: Multi-agent code generation for competitive problem solving 2024 ACL 2024
Tailoring with targeted precision: Edit-based agents for open-domain procedure customization 2024 ACL 2024
Worldcoder, a model-based LLM agent: Building world models by writing code and interacting with the environment 2024 NeurIPS 2024
Swt-bench: Testing and validating real-world bug-fixes with code agents 2024 NeurIPS 2024
Social Simulation User behavior simulation with large language model-based agents 2025 ACM Trans. Inf. Syst.
An llm-enhanced agent-based simulation tool for information propagation 2024 IJCAI 2024
Can large language model agents simulate human trust behavior? 2024 NeurIPS 2024
Competeai: Understanding the competition dynamics of large language model-based agents 2024 ICML 2024
Cooperate or collapse: Emergence of sustainable cooperation in a society of LLM agents 2024 NeurIPS 2024
Richelieu: Self-evolving llm-based agents for AI diplomacy 2024 NeurIPS 2024
SOTOPIA: interactive evaluation for social intelligence in language agents 2024 ICLR 2024
Towards objectively benchmarking social intelligence of language agents at the action level 2024 ACL 2024
CAMEL: communicative agents for ""mind"" exploration of large language model society 2023 NeurIPS 2023
Roleagent: Building, interacting, and benchmarking high-quality role-playing agents from scripts 2024 NeurIPS 2024
Villageragent: A graph-based multi-agent framework for coordinating complex task dependencies in minecraft 2024 ACL 2024
Describe, explain, plan and select: Interactive planning with llms enables open-world multi-task agents 2023 NeurIPS 2023
Do embodied agents dream of pixelated sheep: Embodied decision making using language guided world modelling 2023 ICML 2023
Sotopia-π: Interactive learning of socially intelligent language agents 2024 ACL 2024
Timearena: Shaping efficient multitasking language agents in a time-aware simulation 2024 ACL 2024
Graphical User Interface Agents On the multi-turn instruction following for conversational web agents 2024 ACL 2024
Seeclick: Harnessing GUI grounding for advanced visual GUI agents 2024 ACL 2024
You only look at screens: Multimodal chain-of-action agents 2024 ACL 2024
A real-world webagent with planning, long context understanding, and program synthesis 2024 ICLR 2024
Agentoccam: A simple yet strong baseline for llm-based web agents 2025 ICLR 2025
Synatra: Turning indirect knowledge into direct demonstrations for digital agents at scale, 2024 NeurIPS 2024
Agent S: an open agentic framework that uses computers like a human 2025 ICLR 2025
Agenttrek: Agent trajectory synthesis via guiding replay with web tutorials 2025 ICLR 2025
Visualwebarena: Evaluating multimodal agents on realistic visual web tasks 2024 ACL 2024
Webarena: A realistic web environment for building autonomous agent 2024 ICLR 2024
Webvoyager: Building an end-to-end web agent with large multimodal models 2024 ACL 2024
Workarena: How capable are web agents at solving common knowledge work tasks? 2024 ICML 2024
Large language models empowered personalized web agents 2025 WWW 2025
Screenagent: A vision language model-driven computer control agent 2024 IJCAI 2024
Autowebglm: A large language model-based web navigating agent 2024 ACM, 2024
Coco-agent: A comprehensive cognitive MLLM agent for smartphone GUI automation 2024 ACL
2024
Autoguide: Automated generation and selection of context-aware guidelines for large language model agents 2024 NeurIPS 2024
adversarial robustness of multimodal LM agents 2025 ICLR 2025
Webrl: Training LLM web agents via self-evolving online curriculum reinforcement learning 2025 ICLR 2025
Model Analysis, Evaluation and Improvement Dynamic evaluation of large language models by meta probing agents 2024 ICML 2024
A multimodal automated interpretability agent 2024 ICML 2024
Ali-agent: Assessing llms’ alignment with human values via agent-based evaluation 2024 NeurIPS
2024
Chateval: Towards better llm-based evaluators through multi-agent debate 2024 ICLR 2024
Cheatagent: Attacking llm-empowered recommender systems via LLM agent 2025 arXiv
Star-agents: Automatic data optimization with LLM agents for instruction tuning 2024 NeurIPS 2024
Scenario- driven Finance A multimodal foundation agent for financial trading: Tool augmented, diversified, and generalist 2024 KDD 2024
Econagent: Large language model-empowered agents for simulating macroeconomic activities 2024 ACL 2024
Fincon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making 2024 NeurIPS 2024
Designing heterogeneous LLM agents for financial sentiment analysis 2025 ACM Trans. Manag. Inf. Syst.
Science Ds-agent: Automated data science by empowering large language models with case-based reasoning 2024 ICML 2024
ICA-CRMAS: intelligent context-awareness approach for citation recommendation based on multi-agent system 2024 ACM Trans. Manag. Inf. Syst.
Matplotagent: Method and evaluation for llm-based agentic scientific data visualization 2024 ACL 2024
OSDA agent: Leveraging large language models for de novo design of organic structure directing agents 2025 ICLR 2025
Scienceagentbench: Toward rigorous assessment of language agents for data-driven scientific discovery 2025 ICLR 2025
Healthcare Medagents: Large language models as collaborators for zero-shot medical reasoning 2024 ACL 2024
Pathgen-1.6m: 1.6 million pathology image-text pairs generation through multi-agent collaboration, 2025 ICLR 2025
Psychogat: A novel psychological measurement paradigm through interactive fiction games with LLM agents 2024 ACL 2024
De novo drug design using reinforcement learning with multiple GPT agents 2024 arXiv
Biodiscoveryagent: An AI agent for designing genetic perturbation experiments 2025 ICLR 2025
Colacare: Enhancing electronic health record modeling through large language model-driven multi-agent collaboration 2025 ACM 2025
Spokenwoz: A large-scale speech-text benchmark for spoken task-oriented dialogue agents 2023 NeurIPS 2023
Game Advancing DRL agents in commercial fighting games: Training, integration, and agent-human alignment 2024 ICML 2024
Deciphering digital detectives: Understanding LLM behaviors and capabilities in multi-agent mystery games 2024 ACL 2024
STARLING: self-supervised training of text-based reinforcement learning agent with large language models, 2024 ACL 2024
Whodunitbench: Evaluating large multimodal agents via murder mystery games 2024 NeurIPS 2024
Monte carlo planning with large language model for text-based game agents 2025 ICLR 2025
Manufacturing Monte carlo planning with large language model for text-based game agents 2025 ICLR 2025
Robotics Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration 2024 NeurIPS 2024
Steve-eye: Equipping llm-based embodied agents with visual perception in open worlds 2024 ICLR 2024
Can we trust embodied agents? exploring backdoor attacks against embodied llm based decision-making systems 2025 ICLR 2025
Opex: A component-wise analysis of llm-centric agents in embodied instruction following 2024 ACL 2024
Urban  Computing Geoagent: To empower llms using geospatial tools for address standardization 2024 ACL 2024
Large language models as urban residents: An LLM agent framework for personal mobility generation 2024 NeurIPS 2024
Urbankgent: A unified large language model agent framework for urban knowledge graph construction 2024 NeurIPS 2024
Social  Media Ask, acquire, understand: A multimodal agent-based framework for social abuse detection in memes 2025 ACM 2025
Local: Logical and causal fact-checking with llm-based multi-agents 2025 ACM 2025
Autonomous  Driving SMART: scalable multi agent real-time motion generation via next-token prediction 2024 NeurIPS 2024
Llmlight: Large language models as traffic signal control agents 2025 KDD 2025
Literary  Creation IBSEN: director-actor agent collaboration for controllable and interactive drama script generation 2024 ACL 2024
Agents’ room: Narrative generation through multi-step collaboration 2025 ICLR 2025

Values Alignment Evaluation for Agent Systems

Methodologies for Agent Value Alignment

Datasets

Time Dataset Paper Keywords Level Venue
2025 DEFSurveySim Towards realistic evaluation of cultural value alignment in large language models: Diversity enhancement for survey response simulation Nation, Culture Macro Level, Meso Level Information Processing & Management
2025 NaVAB Benchmarking Multi-National Value Alignment for Large Language Models Nation Meso Level arXiv preprint arXiv:2504.12911
2025 German Credit Data EARN Fairness: Explaining, Asking, Reviewing, and Negotiating Artificial Intelligence Fairness Metrics Among Stakeholders Company governance Micro Level Proceedings of the ACM on Human-Computer Interaction}
2024 CultureSPA Self-Pluralising Culture Alignment for Large Language Models Nation, Culture Macro Level, Meso Level arXiv preprint arXiv:2410.12971
2024 DailyDilemmas DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life Harmlessness, Responsibility, Justice & Fairness, Harmlessness, Virtue Macro Level arXiv preprint arXiv:2410.02683
2024 HofstedeCulturalDimensions How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions Culture Macro Level arXiv preprint arXiv:2406.14805
2024 IndieValueCatalog Can Language Models Reason about Individualistic Human Values and Preferences? Justice & Fairness Macro Level arXiv preprint arXiv:2410.03868
2024 KorNAT KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge Nation, Culture,Justice & Fairness Macro Level, Meso Level arXiv preprint arXiv:2402.13605
2024 LLMGlobe LLM-GLOBE: A Benchmark Evaluating the Cultural Values Embedded in LLM Output Harmlessness, Justice & Fairness, Privacy, Beneficence, Responsibility Macro Level arXiv preprint arXiv:2411.06032
2024 LaWGPT Lawyer GPT: A Legal Large Language Model with Enhanced Domain Knowledge and Reasoning Capabilities Fairness in legal Macro Level, Micro Level Proceedings of the 2024 3rd International Symposium on Robotics, Artificial Intelligence and Information Engineering
2024 Moral Beliefs Evaluating Moral Beliefs across LLMs through a Pluralistic Framework Nation, Culture,Justice & Fairness, Solidarity, Sustainability, Transparency Macro Level, Meso Level arXiv preprint arXiv:2411.03665
2024 Moral Stories Measuring Human-AI Value Alignment in Large Language Models Harmlessness, Justice & Fairness, Responsibility, Beneficence, Dignity, Virtue, Freedom & Autonomy Macro Level Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society
2024 PkuSafeRLHF Pku-saferlhf: Towards multi-level safety alignment for llms with human preference Harmlessness, Freedom & Autonomy, Justice & Fairness,Turst, Privacy, Responsibility, Beneficence Macro Level, Meso Level arXiv preprint arXiv:2406.15513
2024 ProgressGym ProgressGym: Alignment with a Millennium of Moral Progress Harmlessness, Freedom & Autonomy, Turst, Dignity, Beneficence Macro Level Advances in Neural Information Processing Systems
2024 SafeSora SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset Harmlessness,Usefulness,Responsibility Macro Level Advances in Neural Information Processing Systems
2023 MFQ(Moral Foundations Questionnaire) Moral Foundations of Large Language Models Turst, Responsibility Macro Level
2023 BeaverTails Beavertails: Towards improved safety alignment of llm via a human-preference dataset Harmlessness, Justice & Fairness, Privacy, Beneficence, Responsibility Macro Level Advances in Neural Information Processing Systems
2023 CBBQ(Chinese Bias Benchmark Dataset) CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models China(Safeguarding national security and adhering to the core socialist values) Meso Level arXiv preprint arXiv:2306.16244
2023 CDEval CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models Cultural, Eduaction, Individualism Macro Level, Meso Level arXiv preprint arXiv:2311.16421
2023 CORGI-PM CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation Justice & Fairness Macro Level arXiv preprint arXiv:2301.00395
2023 Cvalues Cvalues: Measuring the values of chinese large language models from safety to responsibility. Harmlessness, Responsibility Macro Level, Meso Level arXiv preprint arXiv:2307.09705
2023 DecodingTrust DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Privacy, Justice & Fairness, Harmlessness Macro Level Cited on
2023 EEC(Equity Evaluation Corpus) Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems Harmlessness Macro Level arXiv preprint arXiv:2311.04892
2023 Flames Flames: Benchmarking Value Alignment of LLMs in Chinese Justice & Fairness, Responsibility, Harmlessness, Privacy Macro Level arXiv preprint arXiv:2311.06899
2023 GlobalOpinionQA Towards Measuring the Representation of Subjective Global Opinions in Language Models Nation, Culture Macro Level, Meso Level arXiv preprint arXiv:2306.16388
2023 Persona Bias Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs Dignity Macro Level arXiv preprint arXiv:2311.04892
2023 Social Chemistry 101 TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models Justice & Fairness,Harmlessness, Responsibility, Dignity, Beneficence Macro Level arXiv preprint arXiv:2306.11507
2023 ToxiGen An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models Justice & Fairness Macro Level arXiv preprint arXiv:2301.09211
2022 CDial-Bias Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks Virtue Macro Level arXiv preprint arXiv:2202.08011
2022 Moral Integrity Corpus The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems Justice & Fairness, Responsibility, Beneficence, Dignity, Virtue Macro Level arXiv preprint arXiv:2204.03021
2022 MoralExceptQA When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment Solidarity, Harmlessness, Responsibility, Beneficence, Dignity Macro Level Advances in neural information processing systems
2022 ValueNet Valuenet: A new dataset for human value driven dialogue system. Freedom & Autonomy,Beneficence, Harmlessness, Dignity, Freedom & Autonomy Macro Level Proceedings of the AAAI Conference on Artificial Intelligence
2021 BBQ(Bias Benchmark for QA) BBQ: A Hand-Built Bias Benchmark for Question Answering Justice & Fairness Macro Level arXiv preprint arXiv:2110.08193
2021 BOLD BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation Justice & Fairness Macro Level Proceedings of the 2021 ACM conference on fairness, accountability, and transparency
2021 Scruples Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes Beneficence Macro Level Proceedings of the AAAI Conference on Artificial Intelligence
2020 CrowS-Paris CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models Justice & Fairness Macro Level arXiv preprint arXiv:2010.00133
2020 ETHICS Aligning AI With Shared Human Values Justice & Fairness, Responsibility, Beneficence, Dignity, Usefulness arXiv preprint arXiv:2008.02275
2020 StereoSet StereoSet: Measuring stereotypical bias in pretrained language models Justice & Fairness Macro Level arXiv preprint arXiv:2004.09456
2020 UnQover UnQovering Stereotyping Biases via Underspecified Questions Justice & Fairness Macro Level arXiv preprint arXiv:2010.02428
2019 Social Bias Frames Social Bias Frames: Reasoning about Social and Power Implications of Language Freedom & Autonomy Macro Level arXiv preprint arXiv:1911.03891
2019 WikiGenderBias Towards Understanding Gender Bias in Relation Extraction Dignity Macro Level arXiv preprint arXiv:1911.03642
2018 WinoBias Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods Solidarity Macro Level arXiv preprint arXiv:1804.06876
2018 WinoGender Gender Bias in Coreference Resolution Justice & Fairness Macro Level arXiv preprint arXiv:1804.09301

Future Directions

About

This repository is created to support the survey paper Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives, by collecting and categorizing relevant research papers and datasets on value alignment in agentic AI systems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •