moral-decision-dataset

Overview

This repository contains code files and documentation pertaining to the Moral Decision Dataset (MDD). The MDD describes real-world cases, associated parameters, and case-based moral decisions. To enhance the usability of this resource, this overview provides additional information on how the dataset was created and a tutorial to set up and reproduce our results. For more detail on the resource, please see our published paper.

Introduction

The ubiquity of autonomous systems in critical decision-making capacities with significant impacts on society and its functioning makes it imperative to provide them with moral cognitive abilities. To facilitate this effort, we have curated a Moral Decision Dataset (MDD) that captures everyday scenarios where a question for morality is raised, along with parameters that aid its moral decision, and the decision itself. MDD is created using an LLM-aided methodology using seed data from online sources, which are then preprocessed, extracted, summarized, and augmented using state-of-the-art LLMs. This paper also provides a brief overview of how language models may be used to curate and develop datasets from sparse and highly abstract data. To demonstrate the validity and robustness of the dataset, we also present an Ethics Scoring Algorithm (ESA) that reuses the parameters defined in the dataset to calculate ethical scores for isolated actions. Furthermore, the ESA introduces the novel concept of context-sensitive thresholding to discretize grey areas in an effort to resolve ethical dilemmas. This work aims to facilitate moral reasoning in AI systems that are deployed in various sections of society through a clearly outlined methodology, modular development, and generalized applicability.

This project makes the following contributions:

A methodology to develop and curate a dataset for sparse, abstract, and subjective data using language models.
A Moral Decision Dataset that captures scenarios and associated parameters that aid the moral decision.
A knowledge graph (KG) that extends the MDD.
An Ethics Scoring Algorithm that provides ethical judgment based on ethics theory and available contextual information.
A method to quantify case-specific grey areas using context-sensitive thresholding.

Methodology

Domain Understanding The project started with extensive research, including:

Insights from legal professionals.
Analysis of real-world cases on forums like Reddit and Quora.
Understanding variability in legal interpretations across jurisdictions.

Data Collection Raw legal data was extracted from Reddit subreddits across multiple countries (India, UK, Canada, etc.) using a custom Python script. Data format:

Title
Case Text
Upvote Ratio

Feature Extraction Key features extracted using LLMs:

Active Agent
Passive Agent
Action Done by Active Agent
Domain
Ethical Issues
Consequences (severity, utility, duration)
Moral Intentions
Ethical Principles Upheld/Violated
Relationship Between Agents
Moral Decision

Summarization Cases were summarized using a predefined template for accuracy: The did to which led to . The had <good/bad/neutral> moral intention, however, the violated which caused .
Augmentation Data augmentation techniques were used to generate multiple instances of legal cases by varying:

Context
Agents
Ethical issues

Validation Results from Llama-3 were validated using Gemma LLM with additional feedback. This ensured accuracy and consistency.

Dataset Description - MDD

By conforming to the normative definition of morality, we adopt the normative definition of ethics as well when embedding morality in AI. This involves consequentialism, deontology, and virtue ethics. Each of these corresponds to certain real-world parameters: the characteristics of consequences, the moral intentions of the doer, and the ethical principles upheld and violated by the action. In collaboration with our team of ethicists, we have identified and verified these parameters that would, in addition to meta parameters such as action, agents, domain act as the key features.

Ethics Scoring Algorithm

The Ethics Scoring Algorithm (ESA) is a metric that allows one to discretize key features associated with a real-world scenario and determine through a weighted sum the ethics score of the active agent's action. It considers all three normative schools of thought - consequentialism, deontology, and virtue ethics. Based on the preference of the user and the applied ethics domain provided, we can favor a particular school of thought over others.

Repository Details

This repository consists of three main folders which contain the entirety of the code used to develop this project. It is subdivided into three folders: MDD, MDKG, and evaluator. The MDD folder consists of code files that were used in the development of the moral decision dataset. It also consists of a sampler dataset called MDD_100.csv. In addition to this, this folder consists of all the subreddits that were used, the raw files, and the python code files. MDKG consists of the turtle format file mdkg.ttl, which contains the YARRRML mappings from relational features of the MDD dataset, to the knowledge graph MDKG. A sampler of some SPARQL queries has also been provided, which may be run on the MDKG file to check for consistency. Finally, the evaluator files consist of tests and experiments that were done on the datasets and LLMs to check for optimal running conditions of the specialized prompt.

Setup Instructions

Prerequisites

Python 3.8+
Kaggle API
Together.ai API Key

Dependencies Install required libraries:

pip install pandas spacy tqdm json re togetherai

Usage Feature Extraction Run the feature extraction script:

python feature_extraction.py

Summarization Summarize case texts:

python summarization.py

Evaluation Validate and rate summaries and features:

python evaluation.py

Augmentation Generate augmented cases: python augmentation.py

Outputs -Extracted Features: feature_extraction.csv -Summaries: summary.csv -Evaluated Data: evaluated_data.json -Augmented Cases: augmented_cases.json

Resource Maintenance

This four-fold resource repository (MDD, MDKG, ESA, CST) is to be maintained by the authors of the paper as part of their affiliation to the KRaCR Lab, where this research was conducted. The maintenance involves the augmentation of the dataset and betterment of the algorithmic benchmark through testing and experimentation with newer LLM models and features. We also aim to expand the applied ethics domains on which the ESA+CST is implemented, which as of now spans 17 domains. Finally, we hope that our research efficiently makes use of LLMs to further computational machine ethics learning.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Published Paper_IJCAI 2025		Published Paper_IJCAI 2025
code		code
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

moral-decision-dataset

Overview

Table of Contents:

Introduction

Methodology

Dataset Description - MDD

Ethics Scoring Algorithm

Repository Details

Setup Instructions

Resource Maintenance

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

kracr/moral-decision-dataset

Folders and files

Latest commit

History

Repository files navigation

moral-decision-dataset

Overview

Table of Contents:

Introduction

Methodology

Dataset Description - MDD

Ethics Scoring Algorithm

Repository Details

Setup Instructions

Resource Maintenance

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages