Tales-of-Two-Minds

Evaluating Creativity in Human and Large Language Model Narratives

This repository contains all code and instructions for the experiments carried out by Roberto Passaro, Marta Pavanati, Clotilde Frapiccini, and Anuoluwapo Aremu for an MSc course at CIMeC, University of Trento.

We compare 21 human‑written short stories against 21 GPT‑4.1 continuations (×7 temperatures) using four automated creativity metrics:

Novelty
Surprise
Lexical Diversity
Semantic Diversity

Introduction

Creativity has long been regarded as an exclusively human capacity, but the emergence of large language models (LLMs) raises the question of whether these models can achieve human-comparable levels of creativity in generating narratives. In this study, we compared 21 short stories generated by ChatGPT 4.1 with 21 human-written counterparts, measuring creativity using four automatic metrics: novelty, surprise, lexical diversity, and semantic diversity. We also investigated the impact of temperature settings - a randomness control hyperparameter often associated with creative variation - on narrative creativity. We observed that novelty exhibits temperature-dependent trends, occasionally approaching human levels, and that GPT-4.1 consistently outperforms human authors in lexical diversity and surprise, while semantic diversity is consistently influenced by temperature and remains lower than that of humans. Building on these findings, we conclude by reasoning about the factors that drive the observed differences in narrative creation between humans and LLMs.

Running on Google Colab

The experiments can be run directly in the provided Colab notebook without local setup:

Open notebooks/Tales_of_Two_Minds.ipynb in Google Colab.
Upload a ZIP file containing your 21 human-written .txt stories named story_01.txt through story_21.txt.
Enter your OpenAI API key when prompted.
Run all cells.

All dependencies (Python libraries and the spaCy model) are installed automatically within the Colab environment.

Data

The code expects 21 human-written stories in human_texts/ as UTF-8 .txt files named story_01.txt, story_02.txt, …, story_21.txt. We have not committed the full set here. To rerun the experiment and obtain the human dataset, please email roberto.passaro@studenti.unitn.it.

Usage

Set your OpenAI key and run:

OPENAI_API_KEY = "your key"

This will:

Generate 21 GPT-4.1 continuations at each of seven temperature settings.
Preprocess all texts (lemmatization, filtering).
Compute novelty, surprise, lexical diversity, semantic diversity, and save scores.csv.
Produce summary stats and plots in statplots/.

Results

scores.csv: per-story metrics
lexical_diversity_summary.csv: mean ± std by source & temperature
statplots/: boxplots, QQ-plots, correlation heatmaps, p-value tables

Contributing

Fork the repo
Create a feature branch
Open a Pull Request

License

This project is released under the MIT License.

Contact

Roberto Passaro – <roberto.passaro@studenti.unitn.it>

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE.md		LICENSE.md
README.md		README.md
scores.csv		scores.csv
statplots-20250516T125301Z-1-001.zip		statplots-20250516T125301Z-1-001.zip
tale_of_2_minds.ipynb		tale_of_2_minds.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tales-of-Two-Minds

Introduction

Running on Google Colab

Data

Usage

Results

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Languages

License

robertopassaro/tales-of-2-minds

Folders and files

Latest commit

History

Repository files navigation

Tales-of-Two-Minds

Introduction

Running on Google Colab

Data

Usage

Results

Contributing

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages