Skip to content

πŸ“š A curated collection of foundational papers on Large Language Models (LLMs) reliability and consistency, maintained by the ILP Lab at UVA.

License

Notifications You must be signed in to change notification settings

karolinaranjo/llm-reliability-consistency

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 

Repository files navigation

LLM Reliability & Consistency

Foundational Papers

Reliability Consistency Model Data Type

Reliability Papers

Paper Tags Venue/Source Year Code Description
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity bias robustness reliability toxicity arXiv 2023 N/A A qualitative approach for red-teaming ethical risks of ChatGPT.
TrustLLM: Trustworthiness in Large Language Models truthfulness safety fairness robustness privacy machine ethics ICML 2024 Official | HuggingFace TrustLLM is a comprehensive framework for studying the trustworthiness of LLMs.
LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users bias alignment robustness NeurIPS 2024 Poster LLM performance degrades based on user traits: education levels, English proficiency, and country of origin.
When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour robustness arXiv 2025 N/A This paper investigates sycophancy: the tendency for LLMs to generate responses that align with a user's viewpoint, even when that viewpoint is factually incorrect. T

Consistency Papers

Paper Tags Venue/Source Year Code Description
Measuring and Improving Consistency in Pretrained Language Models factual-knowledge robustness consistency TACL 2021 N/A Pretrained Language Models are factually inconsistent, providing different answers to paraphrased questions. This paper proposes a method to enhance their knowledge's robustness.
Locating and Editing Factual Associations in GPT model-editing knowledge-localization NeurIPS 2022 Official | Datasets Factual knowledge in transformers is stored in localized, mid-layer feed-forward modules. This paper introduces a new method called ROME to surgically update these facts directly within the model.
Self-Consistency Improves Chain of Thought Reasoning in Language Models chain-of-thought reasoning ensembling ICLR 2023 Slides This paper introduces self-consistency, a decoding strategy that boosts chain-of-thought performance by sampling multiple reasoning paths and choosing the most consistent final answer.
Large Language Models Are Human-Level Prompt Engineers prompt-engineering instruction-generation arXiv 2023 Official Automatic Prompt Engineer (APE), a method that successfully uses a large language model to automatically generate and select optimal instructions that are often better than human-crafted prompts.
Evaluating the Moral Beliefs Encoded in LLMs alignment model-probing NeurIPS 2023 Official A survey-based statistical method to probe the moral beliefs of LLMs, revealing that while models handle clear-cut cases well, they are often uncertain or inconsistent in ambiguous scenarios.
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting prompt-engineering sensitivity robustness ICML 2024 N/A Minor prompt formatting changes cause drastic and unpredictable performance swings in large language models (LLMs), undermining the reliability of comparing models using a single, fixed format.

Curated by: Dane Williamson & Karolina Naranjo

Lab: UVA Information and Language Processing Lab

About

πŸ“š A curated collection of foundational papers on Large Language Models (LLMs) reliability and consistency, maintained by the ILP Lab at UVA.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published