large-language-model-evaluation

Star

Here are 2 public repositories matching this topic...

open-compass / GPassK

Star

[ACL 2025] Are Your LLMs Capable of Stable Reasoning?

large-language-model-evaluation reasoning-stability

Updated Aug 5, 2025
Python

C-you-know / Action-Based-LLM-Testing-Harness

Star

Ranking Large Language Models using the Principle of Least Action! Built during my time at Knit Space, Hubbali under the guidance Prof. Prakash Hegade.

dataset-generation dynamic-testing large-language-models large-language-model-evaluation

Updated Jul 30, 2025
Python

Improve this page

Add a description, image, and links to the large-language-model-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the large-language-model-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

large-language-model-evaluation

Here are 2 public repositories matching this topic...

open-compass / GPassK

C-you-know / Action-Based-LLM-Testing-Harness

Improve this page

Add this topic to your repo