HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation (Evaluators)

This repository provides off-the-shelf 0-shot LLM-based evaluators for summarization and story generation, from the paper HypoEval: Hypothesis-Guided Evaluation of Natural Language Generation. For more details on HypoEval implementation, reproducing results, and adding new evaluated aspects, please refer to ChicagoHAI/HypoEval-Gen.

Updates

May'25: HypoEval is now incorporated in quotient-ai/judges!

Usage

To use the evaluator for summaries on aspect in ["coherence", "consistency", "informativeness", "fluency", "relevance"]:

from hypoeval.evaluator import SummaryEvaluator

evaluator = SummaryEvaluator(model_name=MODEL_NAME, model_path=MODEL_PATH) # (optional) specify model path for local models
evaluated_aspect = "coherence"
summary_list = ["...", "..."]
source_text_list = ["...", "..."]
evaluation_scores = evaluator.batched_evaluate(aspect=evaluated_aspect, summaries=summary_list, source_texts=source_text_list)

To use the evaluator for stories on aspect in ["coherence", "cohesiveness", "complexity", "empathy", "engagement", "grammaticality", "likability", "relevance", "surprise"]:

from hypoeval.evaluator import StoryEvaluator

evaluator = StoryEvaluator(model_name=MODEL_NAME, model_path=MODEL_PATH) # (optional) specify model path for local models
evaluated_aspect = "coherence"
story_list = ["...", "..."]
story_prompt_list = ["...", "..."]
evaluation_scores = evaluator.batched_evaluate(aspect=evaluated_aspect, stories=story_list, story_prompts=story_prompt_list)

Citation

Please consider citing our work if it contributes to your research:

@misc{li2025hypoevalhypothesisguidedevaluationnatural,
      title={HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation}, 
      author={Mingxuan Li and Hanchen Li and Chenhao Tan},
      year={2025},
      eprint={2504.07174},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.07174}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LLM_wrapper		LLM_wrapper
data		data
hypoeval		hypoeval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation (Evaluators)

Updates

Usage

Citation

About

Uh oh!

Releases

Packages

Languages

License

ChicagoHAI/HypoEval

Folders and files

Latest commit

History

Repository files navigation

HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation (Evaluators)

Updates

Usage

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages