Skip to content

This repository collects an extensive list of awesome papers about Story Generation / Storytelling, exclusively focusing on the era of Large Language Models (LLMs).

Notifications You must be signed in to change notification settings

yingpengma/Awesome-Story-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome-Story-Generation

Contributed by Yingpeng Ma, Yan Ma

🔥 Recognizing the paradigm shift brought by Large Language Models, we now focus exclusively on LLM-related research.

For those interested in previous research before LLM era, an archived version is available in Old Version (this archive is no longer maintained).

Table of Contents

Introduction

This repository collects awesome papers about Story Generation / Storytelling in LLM era.

Papers are listed chronologically (most recent first).

Thank you for the stars! We're continuously updating with the latest research. Cheers! 🍻

Your contributions matter! Please help keep this list current and accurate by opening issues or PRs for any mistakes or missing papers.

Contact: mayingpeng33 [AT] gmail [DOT] com

Papers

Eg. ACL-20XX Title [paper] [code] .. [authors]

Overview

  • ArXiv-2024 What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation [paper] [Dingyi Yang, Qin Jin]
  • CHI-2024 The Value, Benefits, and Concerns of Generative AI-Powered Assistance in Writing [paper] [Zhuoyan Li, Chen Liang, Jing Peng, Ming Yin] [Mainly about ChatGPT, not including other models]
  • ArXiv-2024 Weaver: Foundation Models for Creative Writing [paper] [Tiannan Wang, Jiamin Chen, Qingrui Jia, Shuai Wang, Ruoyu Fang, ... , Yuchen Eleanor Jiang, Wangchunshu Zhou] [Foundation Models which focus on writing capabilities]
  • EMNLP Findings-2023 Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding [paper] [Lixing Zhu, Runcong Zhao, Lin Gui, Yulan He]

Plan And Write

  • NAACL-2025 Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement [paper] [Qianyue Wang, Jinwu Hu, Zhengping Li, Yufeng Wang, daiyuan li, Yu Hu, Mingkui Tan]
  • EMNLP-2024 Collective Critics for Creative Story Generation [paper] [Minwook Bae, Hyounghun Kim]
  • ACL-2024 Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding [paper] [Lei Huang, Jiaming Guo, Guanhua He, Xishan Zhang, Rui Zhang, Shaohui Peng, Shaoli Liu, Tianshi Chen]
  • ACL Workshop-2025 Guiding and Diversifying LLM-Based Story Generation via Answer Set Programming [paper] [Phoebe J. Wang, Max Kreminski]
  • NAACL-2025 Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models [paper] [Yukyung Lee, Soonwon Ka, Bokyung Son, Pilsung Kang, Jaewook Kang]
  • EACL-2024 Creating Suspenseful Stories: Iterative Planning with Large Language Models [paper] [Kaige Xie, Mark Riedl] [Prompt Engineering]
  • ArXiv-2023 EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation [paper] [Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan] [Prompt Engineering]

Multi Agent

  • ArXiv-2025 HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics [paper] [Sizhou Chen, Shufan Jiang, Chi Zhang, Xiao-Lei Zhang, Xuelong Li]
  • ArXiv-2025 A Cognitive Writing Perspective for Constrained Long-Form Text Generation [paper] [Kaiyang Wan, Honglin Mu, Rui Hao, Haoran Luo, Tianle Gu, Xiuying Chen]
  • ICLR-2025 Agents' Room: Narrative Generation through Multi-step Collaboration [paper] [Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki, Alice Shoshana Jakobovits, Elizabeth Clark, Mirella Lapata]
  • ACL-2024 IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation [paper] [Senyu Han, Lu Chen, Li-Min Lin, Zhengshan Xu, Kai Yu]
  • EMNLP Findings-2024 HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing [paper] [Jing Chen, Xinyu Zhu, Cheng Yang, Chufan Shi, Yadong Xi, Yuxiang Zhang, Junjie Wang, Jiashu Pu, Rongsheng Zhang, Yujiu Yang, Tian Feng]
  • FDG-2024 StoryVerse: Towards Co-authoring Dynamic Plot with LLM-based Character Simulation via Narrative Planning [paper] [Yi Wang, Qian Zhou, David Ledo] [virtual characters]
  • IJCAI-2024 AutoAgents: A Framework for Automatic Agent Generation [paper] [Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje F. Karlsson, Jie Fu, Yemin Shi] [Prompt Engineering]

Better Storytelling

  • ArXiv-2025 Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection [paper] [Kabir Ahuja, Melanie Sclar, Yulia Tsvetkov]
  • ArXiv-2025 Learning to Reason for Long-Form Story Generation [paper] [Alexander Gurung, Mirella Lapata]
  • ArXiv-2024 MLD-EA: Check and Complete Narrative Coherence by Introducing Emotions and Actions [paper] [Jinming Zhang, Yunfei Long]
  • EMNLP Findings-2024 SWAG: Storytelling With Action Guidance [paper] [Zeeshan Patel, Karim El-Refai, Jonathan Pei, Tianle Li] [Reinforcement learning / SFT]
  • EMNLP Findings-2023 Improving Pacing in Long-Form Story Planning [paper] [Yichen Wang, Kevin Yang, Xiaoming Liu, Dan Klein] [Story pacing]
  • ArXiv-2023 End-to-End Story Plot Generator [paper] [Hanlin Zhu, Andrew Cohen, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian] [SFT]
  • EMNLP Findings-2023 GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence [paper] [Zhihua Wen, Zhiliang Tian, Wei Wu, Yuxin Yang, Yanqi Shi, Zhen Huang, Dongsheng Li] [RAG]

More Controllable

  • ArXiv-2025 SCORE: Story Coherence and Retrieval Enhancement for AI Narratives [paper] [Qiang Yi, Yangfan He, Jianhui Wang, Xinyuan Song, Shiyao Qian, Miao Zhang, Li Sun, Tianyu Shi]
  • ArXiv-2024 Crafting Narrative Closures: Zero-Shot Learning with SSM Mamba for Short Story Ending Generation [paper] [Divyam Sharma, Divya Santhanam]
  • ACL-2024 MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation [paper] [code] [Yan Ma, Yu Qiao, Pengfei Liu]
  • ArXiv-2024 Multigenre AI-powered Story Composition [paper] [Edirlei Soares de Lima, Margot M. E. Neggers, Antonio L. Furtado]
  • NAACL-2024 Returning to the Start: Generating Narratives with Related Endpoints [paper] [code] [Anneliese Brei, Chao Zhao, Snigdha Chaturvedi] [SFT / Prompt Engineering]
  • COLM-2024 With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation [paper] [Y. Wang, D. Ma, D. Cai] [LoRA]
  • ICLR-2024 RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment [paper] [Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian] [Reinforcement learning]
  • ArXiv-2023 RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text [paper] [code] [Wangchunshu Zhou, Yuchen Eleanor Jiang, Peng Cui, Tiannan Wang, Zhenxin Xiao, Yifan Hou, Ryan Cotterell, Mrinmaya Sachan] [Prompt Engineering]

More Personalized

  • ArXiv-2025 STORY2GAME: Generating (Almost) Everything in an Interactive Fiction Game [paper] [Eric Zhou, Shreyas Basavatia, Moontashir Siam, Zexin Chen, Mark O. Riedl] [Game]
  • ICLR-2025 R^2: A LLM BASED NOVEL-TO-SCREENPLAY GENERATION FRAMEWORK WITH CAUSAL PLOT GRAPHS [paper] [Zefeng Lin, Yi Xiao, Zhiqiang Mo, Qifan Zhang, Jie Wang, Jiayang Chen, Jiajing Zhang, Hui Zhang, Zhengyi Liu, Xianyong Fang, Xiaohua Xu]
  • ArXiv-2025 Towards Enhanced Immersion and Agency for LLM-based Interactive Drama [paper] [Hongqiu Wu, Weiqi Wu, Tianyang Xu, Jiameng Zhang, Hai Zhao]
  • ArXiv-2025 Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style [paper] [Xueran Han, Yuhan Liu, Mingzhe Li, Wei Liu, Sen Hu, Rui Yan, Zhiqiang Xu, Xiuying Chen] [Style]
  • ArXiv-2025 Whose story is it? Personalizing story generation by inferring author styles [paper] [Nischal Ashok Kumar, Chau Minh Pham, Mohit Iyyer, Andrew Lan]
  • EMNLP-2024 MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models [paper] [Sarfaroz Yunusov, Hamza Sidat, Ali Emami]
  • ArXiv-2024 CAT-LLM: Prompting Large Language Models with Text Style Definition for Chinese Article-style Transfer [paper] [Zhen Tao, Dinghao Xi, Zhiyu Li, Liumin Tang, Wei Xu] [Style]
  • ArXiv-2023 Learning to Generate Text in Arbitrary Writing Styles [paper] [Aleem Khan, Andrew Wang, Sophia Hager, Nicholas Andrews] [Style]

Evaluation

  • ArXiv-2025 CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization [paper] [Brihi Joshi, Sriram Venkatapathy, Mohit Bansal, Nanyun Peng, Haw-Shiuan Chang]
  • ArXiv-2025 LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm [paper] [Siwei Wu, Yizhi Li, Xingwei Qu, Rishi Ravikumar, Yucheng Li, Tyler Loakman, Shanghaoran Quan, Xiaoyong Wei, Riza Batista-Navarro, Chenghua Lin]
  • ArXiv-2025 Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs [paper] [Weijia Xu, Nebojsa Jojic, Sudha Rao, Chris Brockett, Bill Dolan]
  • ArXiv-2024 Evaluating Creative Short Story Generation in Humans and Large Language Models [paper] [Mete Ismayilzada, Claire Stevenson, Lonneke van der Plas]
  • ArXiv-2024 CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints [paper] [Anirudh Atmakuru, Jatin Nainani, Rohith Siddhartha Reddy Bheemreddy, Anirudh Lakkaraju, Zonghai Yao, Hamed Zamani, Haw-Shiuan Chang]
  • COLING-2025 Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs [paper] [Guillermo Marco, Luz Rello, Julio Gonzalo]
  • NAACL-2025 FACTTRACK: Time-Aware World State Tracking in Story Outlines [paper] [Zhiheng Lyu, Kevin Yang, Lingpeng Kong, Daniel Klein]
  • EMNLP-2024 Are Large Language Models Capable of Generating Human-Level Narratives? [paper] [Yufei Tian, Tenghao Huang, Miri Liu, Derek Jiang, Alexander Spangher, Muhao Chen, Jonathan May, Nanyun Peng]
  • EMNLP-2024 STORYSUMM: Evaluating Faithfulness in Story Summarization [paper] [Melanie Subbiah, Faisal Ladhak, Akankshya Mishra, Griffin Adams, Lydia B. Chilton, Kathleen McKeown]
  • ArXiv-2024 Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing? [paper] [Guillermo Marco, Julio Gonzalo, Ramón del Castillo, María Teresa Mateo Girona]
  • EMNLP-2024 Measuring Psychological Depth in Language Models [paper] [Fabrice Harel-Canada, Hanyu Zhou, Sreya Muppalla, Zeynep Yildiz, Miryung Kim, Amit Sahai, Nanyun Peng]
  • TACL-2024 Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation [paper] [Cyril Chhun, Fabian M. Suchanek, Chloé Clavel]
  • TACL-2024 Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers [paper] [Melanie Subbiah, Sean Zhang, Lydia B. Chilton, Kathleen McKeown]
  • EMNLP Findings-2023 A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing [paper] [Carlos Gómez-Rodríguez, Paul Williams]
  • EMNLP-2024 Learning Personalized Alignment for Evaluating Open-ended Text Generation [paper] [Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei Li, Yuandong Tian]
  • ICLR-2024 BooookScore: A systematic exploration of book-length summarization in the era of LLMs[paper][Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer]
  • TMLR-2024 TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks[paper][Dongfu Jiang, Yishan Li, Ge Zhang, Wenhao Huang, Bill Yuchen Lin, Wenhu Chen]
  • CHI-2023 Art or Artifice? Large Language Models and the False Promise of Creativity [paper] [Tuhin Chakrabarty, Philippe Laban, Divyansh Agarwal, Smaranda Muresan, Chien-Sheng Wu]
  • ACL-2023 HAUSER: Towards Holistic and Automatic Evaluation of Simile Generation [paper] [Qianyu He, Yikai Zhang, Jiaqing Liang, Yuncheng Huang, Yanghua Xiao, Yunwen Chen]
  • ACL-2023 Can Large Language Models Be an Alternative to Human Evaluations? [paper] [Cheng-Han Chiang, Hung-yi Lee]
  • EMNLP Findings-2023 DeltaScore: Evaluating Story Generation with Differentiating Perturbations [paper] [Zhuohan Xie, Miao Li, Trevor Cohn, Jey Han Lau]
  • EMNLP-2022 StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning [paper] [Hong Chen, Duc Minh Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama]
  • COLING-2022 Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation [paper] [Cyril Chhun, Pierre Colombo, Chloé Clavel, Fabian M. Suchanek]
  • TACL-2022 LOT: A story-centric benchmark for evaluating Chinese long text understanding and generation [paper] [Jian Guan, Zhuoer Feng, Yamei Chen, Ruilin He, Xiaoxi Mao, Changjie Fan, Minlie Huang]
  • ACL-2021 Openmeva: A benchmark for evaluating open-ended story generation metrics [paper] [Jian Guan, Zhexin Zhang, Zhuoer Feng, Zitao Liu, Wenbiao Ding, Xiaoxi Mao, Changjie Fan, Minlie Huang]
  • EMNLP-2020 Union: An unreferenced metric for evaluating open-ended story generation [paper] [code] [Jian Guan, Minlie Huang]

Dataset

  • EMNLP Findings-2024 BookWorm: A Dataset for Character Description and Analysis [paper] [Argyrios Papoudakis, Mirella Lapata, Frank Keller]
  • EMNLP Workshop-2025 The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories [paper] [Xi Yu Huang, Krishnapriya Vishnubhotla, Frank Rudzicz]
  • NAACL Findings-2025 CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis [paper] [Saranya Venkatraman, Nafis Irtiza Tripto, Dongwon Lee]
  • ACL Findings-2024 Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives [paper] [Runcong Zhao, Qinglin Zhu, Hainiu Xu, Jiazheng Li, Yuxiang Zhou, Yulan He, Lin Gui]
  • LREC-COLING-2024 CLAUSE-ATLAS: A Corpus of Narrative Information to Scale up Computational Literary Analysis [paper] [Enrica Troiano, Piek T.J.M. Vossen]
  • LREC-COLING-2024 Reflections & Resonance: Two-Agent Partnership for Advancing LLM-based Story Annotation [paper] [Yuetian Chen, Mei Si]
  • LREC-COLING-2024 CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation [paper] [Yujie Shao, Xinrong Yao, Xingwei Qu, Chenghua Lin, Shi Wang, Stephen W. Huang, Ge Zhang, Jie Fu]
  • ArXiv-2023 STONYBOOK: A System and Resource for Large-Scale Analysis of Novels [paper] [Charuta Pethe, Allen Kim, Rajesh Prabhakar, Tanzir Pial, Steven Skiena]
  • ACL-2023 StoryWars: A Dataset and Instruction Tuning Baselines for Collaborative Story Understanding and Generation [paper] [Yulun Du, Lydia Chilton]
  • NAACL-2022 A corpus for understanding and generating moral stories [paper] [Jian Guan, Ziqi Liu, Minlie Huang]
  • EVAL4NLP-2021 StoryDB: Broad Multi-language Narrative Dataset [paper] [Alexey Tikhonov, Igor Samenko, Ivan P. Yamshchikov]
  • ACL-2022 SummScreen: A Dataset for Abstractive Screenplay Summarization [paper] [data] [Mingda Chen, Zewei Chu, Sam Wiseman, Kevin Gimpel]
  • ArXiv-2021 TVStoryGen: A Dataset for Generating Stories with Character Descriptions [paper] [Mingda Chen, Kevin Gimpel]
  • EMNLP-2020 STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation [paper] [Nader Akoury, Shufan Wang, Josh Whiting, Stephen Hood, Nanyun Peng, Mohit Iyyer]
  • NAACL-2016 A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories [paper] [Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, James Allen]

Public Resources

  • Understanding AI for Stories serves as a survey blog that delves into the application of AI in the realm of story generation, shedding light on its potential as well as the challenges that it encounters.
  • ROC Stories is a compilation of 100,000 five-sentence stories and 3,742 Story Cloze Test stories, capturing a rich array of causal and temporal commonsense connections between everyday events, making it suitable for story generation tasks.
  • CommonGen was developed by combining crowdsourced and existing caption corpora, containing 79k commonsense descriptions across 35k distinct concept-sets.
  • CMU Movie Summary Corpus offers access to a dataset containing movie plot summaries and related metadata.
  • Scifi TV Show Plot Summaries & Events is a collection of plot synopses for long-running (80+ episodes) science fiction TV shows, sourced from Fandom.com wikis.

Star History Chart

About

This repository collects an extensive list of awesome papers about Story Generation / Storytelling, exclusively focusing on the era of Large Language Models (LLMs).

Topics

Resources

Stars

Watchers

Forks

Languages