reliable-evaluation

Here are 2 public repositories matching this topic...

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning

Add a description, image, and links to the reliable-evaluation topic page so that developers can more easily learn about it.

To associate your repository with the reliable-evaluation topic, visit your repo's landing page and select "manage topics."