🌐 Project Website: https://llm-psychometrics.com
This repository accompanies the paper Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement. It contains a curated list of Large Language Models (LLMs) psychometrics resources. We will continue to update this repository as we find new resources. We would greatly appreciate it if you could contribute to this repository by submitting a pull request or an issue.
If you find this repository useful, we would greatly appreciate it if you could give us a star and cite the paper as follows:
@article{ye2025large,
title={Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement},
author={Ye, Haoran and Jin, Jing and Xie, Yuhang and Zhang, Xin and Song, Guojie},
journal={arXiv preprint arXiv:2505.08245},
year={2025},
note={Project website: \url{https://llm-psychometrics.com}, GitHub: \url{https://github.com/ValueByte-AI/Awesome-LLM-Psychometrics}}
}
- Add tags to each entry
- 🗯️ Big Five / HEXACO / Myers-Briggs Type Indicator (MBTI) / Dark Triad / Others & custom
- 🧪 Personality is the enduring configuration of characteristics and behavior that comprises an individual’s unique adjustment to life.
- ⚖️ Schwartz’s Theory / World Values Survey (WVS) / Global Leadership and Organizational Behavior Effectiveness (GLOBE) / Social Value Orientation (SVO) / Others & custom
- 🧪 Values are enduring beliefs that guide behavior and decision-making, reflecting what is important and desirable to an individual or group.
- 🧬 Moral Foundations (MFT) / Defining Issues Test (DIT) / ETHICS / Others & custom
- 🧪 Morality is the categorization of intentions, decisions and actions into those that are proper, or right, and those that are improper, or wrong.
-
🗣️ American National Election Studies (ANES) / American Trends Panel(ATP) / German Longitudinal Election Study (GLES) / Political Compass Test (PCT)
-
🧪 Attitudes are always attitudes about something. This implies three necessary elements: first, there is the object of thought, which is both constructed and evaluated. Second, there are acts of construction and evaluation. Third, there is the agent, who is doing the constructing and evaluating. We can therefore suggest that, at its most general, an attitude is the cognitive construction and affective evaluation of an attitude object by an agent.
- 🧪 Heuristics and biases are mental shortcuts or rules of thumb that simplify decision-making and problem-solving.
-
🌀 Theory of Mind (ToM) / Emotional Intelligence / Social Intelligence
-
🧪 Theory of Mind is the ability to attribute mental states such as beliefs, intentions, and knowledge to others.
🧪 Emotional Intelligence is the subset of social intelligence that involves the ability to monitor one’s own and others’ feelings and emotions, to discriminate among them and to use this information to guide one’s thinking and actions.
🧪 Social Intelligence is the ability to understand and manage people.
- 🧑🤝🧑 Language comprehension / Language generation / Language acquisition
- Test Format: Structured test · Open-ended conversation · Agentic simulation
- Data and Task Sources: Established inventories (e.g., MFT, SVS, MBTI) · Custom-curated items · Synthetic items
- Prompting Strategies: Prompt perturbation · Performance-enhancing prompts (e.g., CoT) · Role-playing prompts
- Model Output & Scoring: Logit-based analysis · Direct scoring · Rule-based scoring · Human scoring · Model-based scoring
-
Reliability: Test-retest · Parallel forms · Inter-rater agreement
-
Content Validity: Data contamination · Novel items
-
Construct Validity: Unique abstraction · Response set · Social Desirability Bias · Cross-lingual Tests
-
Criterion / Ecological Validity: External correlation · Real-world relevance
-
Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications, 2025.04, [paper]
-
The Mind in the Machine: A Survey of Incorporating Psychological Theories in LLMs, 2025.05, [paper]
-
A review of automatic item generation techniques leveraging large language models, 2025.06, [paper]
- (Big Five) Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality, ICML 2024, [paper]
- (Big Five) Can LLM Agents Maintain a Persona in Discourse?, 2025.02, [paper]
- (Big Five) Personality testing of large language models: limited temporal stability, but highlighted prosociality, 2024.01, Royal Society Open Science, [paper]
- (Big Five) Identifying and Manipulating the Personality Traits of Language Models, EMNLP 2023, [paper]
- (Big Five) Do Personality Tests Generalize to Large Language Models?, 2023.11, [paper]
- (Big Five) LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models, EACL 2024, [paper]
- (Big Five) PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits, NAACL 2024 Findings, [paper]
- (Big Five) Eliciting Personality Traits in Large Language Models, 2024.02, [paper]
- (Big Five) Revisiting the Reliability of Psychological Scales on Large Language Models, EMNLP 2024, [paper]
- (Big Five) Evaluating and Inducing Personality in Pre-trained Language Models, NeurIPS 2023, [paper]
- (Big Five) Estimating the Personality of White-Box Language Models, 2022.04, [paper]
- (Big Five) Driving Generative Agents With Their Personality, 2024.02, [paper]
- (Big Five) Large Language Models as Superpositions of Cultural Perspectives, 2023.07, [paper] [code]
- (Big Five) Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Model, AAAI 2025, [paper]
- (Big Five) Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics, NAACL 2025 Findings, [paper]
- (Big Five) Evaluating Psychological Safety of Large Language Models, EMNLP 2024, [paper]
- (Big Five) Dynamic Generation of Personalities with Large Language Models, 2024.04, [paper]
- (Big Five) Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models, 2023.12, [paper]
- (Big Five) AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories, 2023.01, Perspectives on Psychological Science, [paper]
- (Big Five) Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis, 2024.05, [paper]
- (Big Five) ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper] [code]
- (Big Five) Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics, 2024.08, [paper]
- (Big Five) Personality Traits in Large Language Models, 2023.08, [paper]
- (Big Five) You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments, NAACL 2024, [paper]
- (Big Five) Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs, 2023.05, [paper]
- (Big Five) Challenging the Validity of Personality Tests for Large Language Models, Workshop at NeurIPS 2023, [paper]
- (Big Five) LMLPA: Language Model Linguistic Personality Assessment, 2025.01, Computational Linguistics, [paper]
- (Big Five) Dynamic Evaluation of Large Language Models by Meta Probing Agents, ICML 2024, [paper] [code]
- (Big Five) Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items, ACL 2025, [paper]
- (Big Five) Toward Accurate Psychological Simulations: Investigating LLMs’ Responses to Personality and Cultural Variables, Computers in Human Behavior 2025, [paper]
- (Big Five) Personality-Driven Decision-Making in LLM-Based Autonomous Agents, AAMAS 2025, [paper]
- (Big Five) Large Language Models Demonstrate Distinct Personality Profiles, Cureus 2025, [paper]
- (Big Five) Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models, 2025.04, [paper]
- (Big Five) Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games, 2025.04, [paper]
- (Big Five) Improving Language Model Personas via Rationalization with Psychological Scaffolds, 2025.04, [paper]
- (HEXACO) On the Psychology of GPT-4: Moderately anxious, slightly masculine, honest, and humble, 2024.02, [paper]
- (HEXACO) Personality testing of large language models: limited temporal stability, but highlighted prosociality, 2024.01, Royal Society Open Science, [paper]
- (HEXACO) Who is GPT-3? An Exploration of Personality, Values and Demographics, EMNLP 2022 NLP+CSS workshop, [paper]
- (HEXACO) Cognitive phantoms in LLMs through the lens of latent variables, 2024.09, [paper]
- (HEXACO) ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper][code]
- (HEXACO) Exploring the Impact of Personality Traits on LLM Bias and Toxicity, 2025.02, [paper]
- (MBTI) Machine Mindset: An MBTI Exploration of Large Language Models, 2023.12, [paper][code]
- (MBTI) Revisiting the Reliability of Psychological Scales on Large Language Models, EMNLP 2024, [paper]
- (MBTI) Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models, AAAI 2025, [paper]
- (MBTI) Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models, 2023.12, [paper]
- (MBTI) Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models, 2023.07, [paper][code]
- (MBTI) Can ChatGPT Assess Human Personalities? A General Evaluation Framework, 2023.03, [paper][code]
- (MBTI) Identifying Multiple Personalities in Large Language Models with External Evaluation, 2024.02, [paper]
- (MBTI) The Better Angels of Machine Personality: How Personality Relates to LLM Safety, 2024.07, [paper]
- (MBTI) Do Large Language Models Have a Personality? A Psychometric Evaluation with Implications for Clinical Medicine and Mental Health AI, 2025.03, [paper]
- (DarkTriad) On the Psychology of GPT-4: Moderately anxious, slightly masculine, honest, and humble, 2024.02, [paper]
- (DarkTriad) Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench, ICLR 2024 Oral, [paper][code]
- (DarkTriad) Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics, NAACL 2025 Findings, [paper]
- (DarkTriad) Evaluating Psychological Safety of Large Language Models, 2022.12, [paper]
- (DarkTriad) Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models, 2023.12, [paper]
- (DarkTriad) Cognitive phantoms in LLMs through the lens of latent variables, 2024.09, [paper]
- (DarkTriad) Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics, 2024.08, [paper]
- (DarkTriad) I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk, 2025.05, [paper]
- (DarkTriad) Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games, 2025.04, [paper]
- (Others & custom) Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models, 2024.06, [paper]
- (Others & custom) Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality, ICML 2024, [paper]
- (Others & custom) Evaluating and Inducing Personality in Pre-trained Language Models, NeurIPS 2023, [paper]
- (Others & custom) Editing Personality For Large Language Models, NLPCC 2024, [paper]
- (Others & custom) Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play, CogSci 2025, [paper]
- (Others & custom) PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data, 2025.02, [paper]
- (Schwartz) High-Dimension Human Value Representation in Large Language Models, 2024.04, [paper]
- (Schwartz) What does ChatGPT return about human values? Exploring value bias in ChatGPT using a descriptive value theory, 2023.04, [paper]
- (Schwartz) Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz’s Theory of Basic Values, 2024.01, JMIR Mental Health, [paper]
- (Schwartz) Large Language Models as Superpositions of Cultural Perspectives, 2023.07, [paper]
- (Schwartz) When Prompting Fails to Sway: Inertia in Moral and Value Judgments of Large Language Models, NeurIPS 2022, [paper]
- (Schwartz) Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts, 2024.11, [paper]
- (Schwartz) Who is GPT-3? An Exploration of Personality, Values and Demographics, EMNLP 2022 NLP+CSS workshop, [paper]
- (Schwartz) AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories, 2023.01, Perspectives on Psychological Science, [paper]
- (Schwartz) ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper][code]
- (Schwartz) Do LLMs have Consistent Values?, 2024.07, [paper]
- (Schwartz) ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs, 2024.09, [paper]
- (Schwartz) Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values, ACL 2024, [paper]
- (Schwartz) Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models, AAAI 2025, [paper]
- (Schwartz) ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models, 2023.10, [paper]
- (Schwartz) Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items, ACL 2025, [paper]
- (Schwartz) Cultural Value Alignment in Large Language Models: A Prompt-based Analysis of Schwartz Values in Gemini, ChatGPT, and DeepSeek, 2025.05, [paper]
- (Schwartz) The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas, 2025.05, [paper]
- (Schwartz) Improving Language Model Personas via Rationalization with Psychological Scaffolds, 2025.04, [paper]
- (WVS) ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models, 2023.10, [paper]
- (WVS) Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models, 2025.03, [paper]
- (WVS) Exploring Large Language Models on Cross-Cultural Values in Connection with Training Methodology, 2024.12, [paper]
- (WVS) Value Compass Leaderboard: A Platform for Fundamental and Validated Evaluation of LLMs Values, 2025.01, [paper]
- (VSM) How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions, 2024.06, [paper]
- (VSM) Large Language Models as Superpositions of Cultural Perspectives, 2023.07, [paper][code]
- (VSM) ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper][code]
- (VSM) Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models, AAAI 2025, [paper]
- (VSM) Cultural Value Differences of LLMs: Prompt, Language, and Model Size, 2024.07, [paper]
- (GLOBE) LLM-GLOBE: A Benchmark Evaluating the Cultural Values Embedded in LLM Output, 2024.11, [paper]
- (GLOBE) Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models, 2024.06, [paper]
- (GLOBE) ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models, ACL 2024, [paper][code]
- (SVO) Heterogeneous Value Alignment Evaluation for Large Language Models, AAAI 2024 Workshop, [paper][code]
- (Others & custom) Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?, 2025.01, [paper]
- (Others & custom) Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches, 2024.04, [paper]
- (Others & custom) Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing, 2024.06, [paper]
- (Others & custom) Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models, 2024.06, [paper]
- (Others & custom) Measuring Spiritual Values and Bias of Large Language Models, 2024.10, [paper]
- (Others & custom) LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models, 2024.08, [paper]
- (Others & custom) Are Large Language Models Consistent over Value-laden Questions?, EMNLP 2024, [paper]
- (Others & custom) CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility, 2023.07, [paper]
- (Others & custom) DO MINDFULNESS ACTIVITIES IMPROVE HANDGRIP STRENGTH AMONG OLDER ADULTS: A PROPENSITY SCORE MATCHING APPROACH, 2024.12, Innovation in Aging, [paper]
- (Others & custom) Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions, 2025.04, [paper]
- (Others & custom) Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas, 2025.05, [paper]
- (Others & custom) EAVIT: Efficient and Accurate Human Value Identification from Text data via LLMs, 2025.05, [paper]
- (Others & custom) Do Language Models Think Consistently? A Study of Value Preferences Across Varying Response Lengths, 2025.06, [paper]
- (Others & custom) Measurement of LLM’s Philosophies of Human Nature, 2025.04, [paper] [code]
- (MFT) Moral Foundations of Large Language Models, EMNLP 2024, [paper]
- (MFT) Whose Morality Do They Speak? Unraveling Cultural Bias in Multilingual Language Models, 2024.12, [paper]
- (MFT) Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy, NAACL 2022 Workshop, [paper]
- (MFT) MoralBench: Moral Evaluation of LLMs, 2024.06, [paper][code]
- (MFT) Towards "Differential AI Psychology" and in-context Value-driven Statement Alignment with Moral Foundations Theory, 2024.08, [paper]
- (MFT) Analyzing the Ethical Logic of Six Large Language Models, 2025.01, [paper]
- (MFT) Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations, AIES 2024, [paper]
- (MFT) AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories, 2023.01, Perspectives on Psychological Science, [paper][code]
- (MFT) Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity, ACL 2023 Workshop, [paper]
- (MFT) Exploring and steering the moral compass of Large Language Models, ICPR 2024, [paper]
- (MFT) M3oralBench: A MultiModal Moral Benchmark for LVLMs, 2024.12, [paper]
- (MFT) CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses, NeurIPS 2024, [paper]
- (MFT) Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?, NAACL 2024 Findings, [paper]
- (MFT) The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas, 2025.05, [paper]
- (ETHICS) Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety, NeurIPS 2022 Workshop, [paper]
- (ETHICS) Inducing Human-like Biases in Moral Reasoning Language Models, 2024.11, [paper]
- (ETHICS) An Evaluation of GPT-4 on the ETHICS Dataset, 2023.09, [paper]
- (ETHICS) EALM: Introducing Multidimensional Ethical Alignment in Conversational Information Retrieval, SIGIR-AP 2023, [paper][code]
- (DIT) Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test, 2024.02, [paper]
- (DIT) Probing the Moral Development of Large Language Models through Defining Issues Test, 2023.09, [paper]
- (Others & Custom) Large-scale moral machine experiment on large language models, 2024.11, [paper]
- (Others & Custom) SaGE: Evaluating Moral Consistency in Large Language Models, LREC-COLING 2024, [paper]
- (Others & Custom) DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life, 2024.10, [paper]
- (Others & Custom) The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making,2024.10, [paper]
- (Others & Custom) Potential benefits of employing large language models in research in moral education and development, 2023.01, Journal of Moral Education, [paper]
- (Others & Custom) Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment, 2024.11, [paper]
- (Others & Custom) Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing, 2024.06, [paper]
- (Others & Custom) When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment, NeurIPS 2022, [paper][code]
- (Others & Custom) Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?, C3NLP 2024, [paper]
- (Others & Custom) Western, Religious or Spiritual: An Evaluation of Moral Justification in Large Language Models, 2023.11, [paper]
- (Others & Custom) Evaluating Moral Beliefs across LLMs through a Pluralistic Framework, 2024.11, [paper]
- (Others & Custom) LLMs as mirrors of societal moral standards: reflection of cultural divergence and agreement across ethical topics, 2024.12, [paper]
- (Others & Custom) Analyzing the Ethical Logic of Six Large Language Model, 2025.01, [paper]
- (Others & Custom) Extended Japanese Commonsense Morality Dataset with Masked Token and Label Enhancement, CIKM '24 (Short Paper), [paper]
- (Others & Custom) What does AI consider praiseworthy?, 2025.02, AI and Ethics, [paper]
- (Others & Custom) Knowledge of cultural moral norms in large language models, ACL 2023, [paper]
- (Others & Custom) Normative Evaluation of Large Language Models with Everyday Moral Dilemmas, 2025.01, [paper]
- (Others & Custom) Evaluating the Moral Beliefs Encoded in LLMs, NeurIPS 2023, [paper]
- (Others & Custom) The Moral Mind(s) of Large Language Models, 2024.12, [paper]
- (Others & Custom) The moral machine experiment on large language models, 2024.02, Royal Society Open Science, [paper]
- (Others & Custom) Probing the Moral Development of Large Language Models through Defining Issues Test, 2023.09, [paper]
- (Others & Custom) Decoding Multilingual Moral Preferences: Unveiling LLM's Biases through the Moral Machine Experiment, AIES 2024, [paper]
- (Others & Custom) Right vs. Right: Can LLMs Make Tough Choices?, 2024.12, [paper]
- (Culture) Cultural tendencies in generative AI, 2025.06, Nature Human Behaviour, [paper]
- (ANES) Out of One, Many: Using Language Models to Simulate Human Samples, 2023.02, Political Analysis, [paper]
- (ANES) Synthetic Replacements for Human Survey Data? The Perils of Large Language Models, 2024.05, Political Analysis, [paper]
- (ANES) CommunityLM: Probing Partisan Worldviews from Language Models, COLING 2022, [paper]
- (ANES) Representation Bias in Political Sample Simulations with Large Language Models, 2024.07, [paper]
- (ANES) Random Silicon Sampling: Simulating Human Sub-Population Opinion Using a Large Language Model Based on Group-Level Demographic Information, 2024.02, [paper]
- (ANES) Unpacking Political Bias in Large Language Models: A Cross-Model Comparison on U.S. Politics, 2024.12, [paper]
- (ATP) Out of One, Many: Using Language Models to Simulate Human Samples, 2023.02, Political Analysis, [paper]
- (ATP) Whose Opinions Do Language Models Reflect?, ICML 2023, [paper]
- (ATP) Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design, 2024.09, Transactions of the Association for Computational Linguistics (TACL), [paper]
- (GLES) Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction, 2025.02, [paper]
- (GLES) Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study, 2024.12, [paper]
- (GLES) Representation Bias in Political Sample Simulations with Large Language Models, 2024.07, [paper]
- (GLES) Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion, 2024.07, [paper]
- (PCT) PRISM: A Methodology for Auditing Biases in Large Language Models, 2024.10, [paper]
- (PCT) Mapping and Influencing the Political Ideology of Large Language Models using Synthetic Personas, 2024.12, [paper]
- (PCT) The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation, 2023.01, [paper]
- (PCT) Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models, ACL 2024, [paper]
- (PCT) The Political Biases of ChatGPT, 2023.01, Social Sciences, [paper]
- (PCT) The Self-Perception and Political Biases of ChatGPT, 2024.07, [paper]
- (PCT) Revealing Fine-Grained Values and Opinions in Large Language Models, EMNLP 2024 Findings, [paper]
- (Others & custom) The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models, EMNLP 2024 Findings, [paper]
- (Others & custom) Beyond Prompt Brittleness: Evaluating the Reliability and Consistency of Political Worldviews in LLMs, 2024.11, Transactions of the Association for Computational Linguistics (TACL), [paper]
- (Others & custom) Llama meets EU: Investigating the European Political Spectrum through the Lens of LLMs, NAACL 2024 (Short Paper), [paper]
- (Others & custom) Questioning the Survey Responses of Large Language Models, NeurIPS 2024, [paper]
- (Others & custom) Towards Measuring the Representation of Subjective Global Opinions in Language Models, 2023.06, [paper][code]
- (Others & custom) Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models, 2025.03, [paper]
- (Others & custom) From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models, ACL 2023, [paper]
- (Others & custom) Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys, 2024.05, [paper]
- (Others & custom) Improving GPT Generated Synthetic Samples with Sampling-Permutation Algorithm, 2023.08, [paper]
- (Others & custom) AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction, 2023.05, [paper]
- (Others & custom) Linear Representations of Political Perspective Emerge in Large Language Models, 2025.03, [paper]
- (Others & custom) Can large language models estimate public opinion about global warming? An empirical assessment of algorithmic fidelity and bias, 2024.08, PLOS Climate, [paper]
- (Others & custom) How Accurate are GPT-3’s Hypotheses About Social Science Phenomena?, 2023.07, Digital Society, [paper]
- (Others & custom) IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance, 2025.02, [paper]
- (Others & custom) The Political Biases of ChatGPT, 2023.01, Social Sciences, [paper]
- (Others & custom) Demonstrations of the Potential of AI-based Political Issue Polling, 2023.07, Harvard Data Science Review (HDSR), [paper]
- (Others & custom) Large Language Models Can Be Used to Estimate the Latent Positions of Politicians, 2023.03, [paper]
- (Others & custom) Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases, 2025.02, [paper]
- (Others & custom) Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs, 2025.05, [paper]
-
Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students, 2025.04, Big Data and Cognitive Computing, [paper]
-
Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases, NALOMA IV 2023, [paper]
-
FairMonitor: A Dual-framework for Detecting Stereotypes and Biases in Large Language Models, 2024.05, [paper]
-
Using cognitive psychology to understand GPT-3, 2023.02, PNAS, Proceedings of the National Academy of Sciences, [paper][code]
-
Examining Cognitive Biases in ChatGPT 3.5 and 4 through Human Evaluation and Linguistic Comparison, AMTA 2024, [paper]
-
Do Emotions Really Affect Argument Convincingness? A Dynamic Approach with LLM-based Manipulation Checks, 2025.03, [paper]
-
CogBench: a large language model walks into a psychology lab, ICML 2024, [paper]
-
Cognitive Bias in Decision-Making with LLMs, EMNLP 2024 Findings, [paper]
-
Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT, 2023.10, Nature Computational Science, [paper]
-
Relative Value Biases in Large Language Models, CogSci 2024, [paper]
-
Evaluating Nuanced Bias in Large Language Model Free Response Answers, NLDB 2024, [paper]
-
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs, 2024.10, [paper]
-
(Ir)rationality and cognitive biases in large language models, 2024.06, Royal Society Open Science, [paper]
-
A Comprehensive Evaluation of Cognitive Biases in LLMs, 2024.10, [paper][code]
-
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval, NeurIPS 2023, [paper]
-
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems, SEM 2024, [paper]
-
Metacognitive Myopia in Large Language Models, 2024.08, [paper]
-
Visual cognition in multimodal large language models, 2025.01, nature machine intelligence, [paper]
-
Development of Cognitive Intelligence in Pre-trained Language Models, EMNLP 2023, [paper]
-
CBEval: A framework for evaluating and interpreting cognitive biases in LLMs, 2024.12, [paper]
-
Can a Hallucinating Model help in Reducing Human "Hallucination"?, 2024.05, [paper]
-
Challenging the appearance of machine intelligence: Cognitive bias in LLMs and Best Practices for Adoption, 2023.04, [paper]
-
Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models, 2024.12, [paper]
-
Cognitive bias in large language models: Cautious optimism meets anti-Panglossian meliorism, 2023.11, [paper]
-
Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration, 2024.10, [paper]
-
Studying and improving reasoning in humans and machines, 2024.06, Communications Psychology, [paper]
- (Theory of Mind) Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models, EMNLP 2023 Findings, [paper][code]
- (Theory of Mind) A Review on Machine Theory of Mind, 2024.12, IEEE Transactions on Computational Social Systems, [paper]
- (Theory of Mind) A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks, 2025.02, [paper]
- (Theory of Mind) Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses, 2024.06, [paper]
- (Theory of Mind) NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding, EMNLP 2024 Findings, [paper][code]
- (Theory of Mind) Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models, 2024.06, [paper]
- (Theory of Mind) Understanding Social Reasoning in Language Models with Language Models, NeurIPS 2023, [paper]
- (Theory of Mind) HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models, EMNLP 2023 Findings, [paper]
- (Theory of Mind) Does ChatGPT have Theory of Mind?, 2023.05, [paper]
- (Theory of Mind) TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models' Theory-of-Mind, 2024.07, [paper]
- (Theory of Mind) Unveiling Theory of Mind in Large Language Models: A Parallel to Single Neurons in the Human Brain, 2023.09, [paper]
- (Theory of Mind) MMToM-QA: Multimodal Theory of Mind Question Answering, ACL 2024, [paper]
- (Theory of Mind) Comparing Humans and Large Language Models on an Experimental Protocol Inventory for Theory of Mind Evaluation (EPITOME), 2024.06, Transactions of the Association for Computational Linguistics (TACL), [paper]
- (Theory of Mind) Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models, 2025.02, [paper]
- (Theory of Mind) Theory of Mind May Have Spontaneously Emerged in Large Language Models, 2023.02, [paper][code]
- (Theory of Mind) Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models, 2023.10, [paper]
- (Theory of Mind) Theory of Mind for Multi-Agent Collaboration via Large Language Models, EMNLP 2023, [paper][code]
- (Theory of Mind) Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models, PRICAI 2024, [paper]
- (Theory of Mind) Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models, 2024.08, [paper]
- (Theory of Mind) Boosting Theory-of-Mind Performance in Large Language Models via Prompting, 2023.04, [paper]
- (Theory of Mind) Probing the Robustness of Theory of Mind in Large Language Models, 2024.10, [paper]
- (Theory of Mind) Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?, 2024.06, [paper]
- (Theory of Mind) Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective, CHI 2025 Workshop, [paper]
- (Theory of Mind) Multi-ToM: Evaluating Multilingual Theory of Mind Capabilities in Large Language Models, 2024.11, [paper]
- (Theory of Mind) Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs, EMNLP 2022, [paper]
- (Theory of Mind) Decompose-ToM: Enhancing Theory of Mind Reasoning in Large Language Models through Simulation and Task Decomposition, 2025.01, [paper]
- (Theory of Mind) Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker, ACL 2023, [paper]
- (Theory of Mind) Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models, EACL 2024, [paper]
- (Theory of Mind) ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind, 2025.01, [paper]
- (Theory of Mind) Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground, ACL 2024 Findings, [paper]
- (Theory of Mind) Testing theory of mind in large language models and humans, 2024.05, Nature Human Behaviour, [paper]
- (Theory of Mind) LLMsachieve adult human performance on higher-order theory of mind tasks, 2024.05, [paper]
- (Theory of Mind) PHAnToM: Persona-based Prompting Has An Effect on Theory-of-Mind Reasoning in Large Language Models, 2024.03, [paper]
- (Theory of Mind) ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models, NeSy 2024, [paper]
- (Theory of Mind) Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks, 2023.02, [paper]
- (Theory of Mind) Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests, CoNLL 2023, [paper]
- (Theory of Mind) Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities, ACL 2024, [paper]
- (Theory of Mind) OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models, ACL 2024, [paper]
- (Theory of Mind) Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection, 2025.01, [paper]
- (Theory of Mind) PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues, 2025.02, [paper][code]
- (Theory of Mind) AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind, 2025.02, [paper]
- (Theory of Mind) How FaR Are Large Language Models From Agents with Theory-of-Mind?, 2023.10, [paper]
- (Theory of Mind) Dynamic Evaluation of Large Language Models by Meta Probing Agents, ICML 2024, [paper][code]
- (Emotional Intelligence) A Literature Review on Emotional Intelligence of Large Language Models (LLMs), 2024, International Journal of Advanced Research in Computer Science, [paper]
- (Emotional Intelligence) Large Language Models and Empathy: Systematic Review, 2024.01, Journal of Medical Internet Research, [paper]
- (Emotional Intelligence) EmotionQueen: A Benchmark for Evaluating Empathy of Large Language Models, ACL 2024 Findings, [paper]
- (Emotional Intelligence) ChatGPT outperforms humans in emotional awareness evaluations, 2023.05, Frontiers in Psychology, Emotion Science, [paper]
- (Emotional Intelligence) EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models, 2025.02, [paper][code]
- (Emotional Intelligence) Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench, NeurIPS 2024, [paper][code]
- (Emotional Intelligence) Large Language Models Produce Responses Perceived to be Empathic, 2024.03, [paper]
- (Emotional Intelligence) Large Language Models Understand and Can be Enhanced by Emotional Stimuli, LLM@IJCAI'23, [paper][code]
- (Emotional Intelligence) EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models, 2023.12, [paper][code]
- (Emotional Intelligence) dentification and Description of Emotions by Current Large Language Models, 2023.07, [paper]
- (Emotional Intelligence) EmoBench: Evaluating the Emotional Intelligence of Large Language Models, 2024.02, [paper][code]
- (Emotional Intelligence) Exploring ChatGPT’s Empathic Abilities, ACII 2023, [paper]
- (Emotional Intelligence) The Emotional Intelligence of the GPT-4 Large Language Model, 2024.06, Psychology in Russia: State of the Art, [paper]
- (Emotional Intelligence) Are Large Language Models More Empathetic than Humans?, 2024.06, [paper]
- (Emotional Intelligence) Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence, ACL 2024 Findings, [paper]
- (Emotional Intelligence) Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models, 2025.05, [paper]
- (Social Intelligence) DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding, EMNLP 2023, [paper]
- (Social Intelligence) SocialAI 0.1: Towards a Benchmark to Stimulate Research on Socio-Cognitive Abilities in Deep Reinforcement Learning Agents, NAACL 2021 Workshop, [paper][code]
- (Social Intelligence) Do LLM Agents Exhibit Social Behavior?, 2023.12, [paper]
- (Social Intelligence) AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents, 2024.01, [paper]
- (Social Intelligence) Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View, 2024.05, [paper]
- (Social Intelligence) Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions, EMNLP 2024, [paper]
- (Social Intelligence) Large language models can outperform humans in social situational judgments, 2024.11, Scientific Reports, [paper]
- (Social Intelligence) AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios, 2024.10, [paper][code]
- (Social Intelligence) How well DoLarge Language Models Perform on Faux Pas Tests?, ACL 2023 Findings, [paper]
- (Social Intelligence) Towards Objectively Benchmarking Social Intelligence for Language Agents at Action Level, ACL 2024 Findings, [paper]
- (Social Intelligence) Emotional intelligence of Large Language Models, 2023.11, Journal of Pacific Rim Psychology, [paper][code]
- (Social Intelligence) Academically intelligent LLMs are not necessarily socially intelligent, 2024.03, [paper]
- (Social Intelligence) SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents, 2023.10, [paper]
-
(Language comprehension) Language Model Behavior: A Comprehensive Survey, 2023.05, Computational Linguistics(CL), [paper]
-
(Language comprehension) Large Language Models for Psycholinguistic Plausibility Pretesting, EACL 2024 Findings, [paper]
-
(Language comprehension) Syntactic Surprisal From Neural Models Predicts, But Underestimates, Human Processing Difficulty From Syntactic Ambiguities, CoNLL 2022, [paper]
-
(Language comprehension) GPT-4 Surpassing Human Performance in Linguistic Pragmatics, 2023.12, [paper]
-
(Language comprehension) HLB: Benchmarking LLMs' Humanlikeness in Language Use, 2024.09, [paper]
-
(Language comprehension) Large Language Models as Neurolinguistic Subjects: Discrepancy in Performance and Competence for Form and Meaning, 2024.11, [paper]
-
(Language comprehension) Do large language models and humans have similar behaviors in causal inference with script knowledge?, SEM 2024, [paper][code]
-
(Language comprehension) Prompt-based methods may underestimate large language models’ linguistic generalizations, 2023.07, [paper]
-
(Language comprehension) Towards a Psychology of Machines: Large Language Models Predict Human Memory, 2024.03, [paper]
-
(Language comprehension) How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments, 2024.08, [paper]
-
(Language comprehension) A Psycholinguistic Evaluation of Language Models' Sensitivity to Argument Roles, 2024.10, [paper]
-
(Language comprehension) Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention, 2024.05, [paper]
-
(Language comprehension) Evaluating Grammatical Well-Formedness in Large Language Models: A Comparative Study with Human Judgments, CMCL 2024 Workshop, [paper]
-
(Language comprehension) The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs, NeurIPS 2023, [paper]
-
(Language comprehension) Long-form analogies generated by chatGPT lack human-like psycholinguistic properties, CogSci 2023, [paper]
-
(Language comprehension) Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures, CoNLL 2023, [paper]
-
(Language comprehension) Computational Sentence-level Metrics Predicting Human Sentence Comprehension, 2024.03, [paper]
-
(Language comprehension) Are Large Language Models Capable of Generating Human-Level Narratives?, EMNLP 2024, [paper]
-
(Language comprehension) How can large language models become more human?, CMCL 2024, [paper]
-
(Language comprehension) A Targeted Assessment of Incremental Processing in Neural LanguageModels and Humans, ACL 2021, [paper]
-
(Language comprehension) Divergences between Language Models and Human Brains, NeurIPS 2024, [paper]
-
(Language generation) Divergent Creativity in Humans and Large Language Models, 2024.05, [paper]
-
(Language generation) The Crowdless Future? Generative AI and Creative Problem-Solving, 2024.08, Organization Science, [paper]
-
(Language generation) Do large language models resemble humans in language use?, CMCL 2024 Workshop, [paper]
-
(Language generation) Art or Artifice? Large Language Models and the False Promise of Creativity, CHI 2024, [paper]
-
(Language generation) Artificial Intelligence is More Creative Than Humans: A Cognitive Science Perspective on the Current State of Generative Language Models, 2023.09, [paper]
-
(Language generation) An empirical investigation of the impact of ChatGPT on creativity, 2024.08, Nature Human Behaviour, [paper]
-
(Language generation) Evaluating Large Language Models via Linguistic Profiling, EMNLP 2024, [paper]
-
(Language generation) The Language of Creativity: Evidence from Humans and Large Language Models, 2024.01, The Journal of Creative Behavior, [paper]
-
(Language generation) Long-form analogies generated by chatGPT lack human-like psycholinguistic properties, CogSci 2023, [paper]
-
(Language generation) Putting GPT-3's Creativity to the (Alternative Uses) Test, ICCC 2022 (Short Paper), [paper]
-
(Language generation) Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models, 2024.12, [paper]
-
(Language generation) Are Large Language Models Capable of Generating Human-Level Narratives?, 2024.07, [paper]
-
(Language acquisition) Bridging the data gap between children and large language models, 2023.11, Trends in Cognitive Sciences (TICS) [paper]
-
(Language acquisition) Psychomatics—A Multidisciplinary Framework for Understanding Artificial Minds, 2024.04, Cyberpsychology, Behavior, and Social Networking, [paper]
-
(Language acquisition) Development of Cognitive Intelligence in Pre-trained Language Models, 2024.07, [paper]
-
(Language acquisition) Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures, CoNLL 2023, [paper]
-
Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges, 2024.09, [paper]
-
CogBench: a large language model walks into a psychology lab, ICML 2024, [paper]
-
Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis, 2024.12, The BMJ(British Medical Journal), [paper]
-
The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks, 2024.10, [paper]
-
CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models, EMNLP 2024 Findings, [paper]
-
Language models and psychological sciences, 2023.10, Frontiers in Psychology, [paper]
-
M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark, 2024.06, [paper]
-
CogLM: Tracking Cognitive Development of Large Language Models, 2024.08, [paper]
-
Emergent analogical reasoning in large language models, 2023.07, Nature Human Behaviour, [paper]
-
Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task, 2025.02, [paper][code]
-
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs, 2024.06, [paper]
-
Exploring the Cognitive Knowledge Structure of Large Language Models: An Educational Diagnostic Assessment Approach, EMNLP 2023 (Short Paper), [paper]