[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
-
Updated
Feb 26, 2025 - Python
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
Add a description, image, and links to the reliable-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the reliable-evaluation topic, visit your repo's landing page and select "manage topics."