LLM Reliability & Consistency

Foundational Papers

Reliability Papers

Paper	Tags	Venue/Source	Year	Code	Description
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity	`bias` `robustness` `reliability` `toxicity`	arXiv	2023	N/A	A qualitative approach for red-teaming ethical risks of ChatGPT.
TrustLLM: Trustworthiness in Large Language Models	`truthfulness` `safety` `fairness` `robustness` `privacy` `machine ethics`	ICML	2024	Official \| HuggingFace	TrustLLM is a comprehensive framework for studying the trustworthiness of LLMs.
LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users	`bias` `alignment` `robustness`	NeurIPS	2024	Poster	LLM performance degrades based on user traits: education levels, English proficiency, and country of origin.
When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour	`robustness`	arXiv	2025	N/A	This paper investigates sycophancy: the tendency for LLMs to generate responses that align with a user's viewpoint, even when that viewpoint is factually incorrect. T

Consistency Papers

Paper	Tags	Venue/Source	Year	Code	Description
Measuring and Improving Consistency in Pretrained Language Models	`factual-knowledge` `robustness` `consistency`	TACL	2021	N/A	Pretrained Language Models are factually inconsistent, providing different answers to paraphrased questions. This paper proposes a method to enhance their knowledge's robustness.
Locating and Editing Factual Associations in GPT	`model-editing` `knowledge-localization`	NeurIPS	2022	Official \| Datasets	Factual knowledge in transformers is stored in localized, mid-layer feed-forward modules. This paper introduces a new method called `ROME` to surgically update these facts directly within the model.
Self-Consistency Improves Chain of Thought Reasoning in Language Models	`chain-of-thought` `reasoning` `ensembling`	ICLR	2023	Slides	This paper introduces self-consistency, a decoding strategy that boosts chain-of-thought performance by sampling multiple reasoning paths and choosing the most consistent final answer.
Large Language Models Are Human-Level Prompt Engineers	`prompt-engineering` `instruction-generation`	arXiv	2023	Official	Automatic Prompt Engineer (APE), a method that successfully uses a large language model to automatically generate and select optimal instructions that are often better than human-crafted prompts.
Evaluating the Moral Beliefs Encoded in LLMs	`alignment` `model-probing`	NeurIPS	2023	Official	A survey-based statistical method to probe the moral beliefs of LLMs, revealing that while models handle clear-cut cases well, they are often uncertain or inconsistent in ambiguous scenarios.
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting	`prompt-engineering` `sensitivity` `robustness`	ICML	2024	N/A	Minor prompt formatting changes cause drastic and unpredictable performance swings in large language models (LLMs), undermining the reliability of comparing models using a single, fixed format.

Curated by: Dane Williamson & Karolina Naranjo

Lab: UVA Information and Language Processing Lab

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Reliability & Consistency

Reliability Papers

Consistency Papers

About

Uh oh!

Releases

Packages

License

karolinaranjo/llm-reliability-consistency

Folders and files

Latest commit

History

Repository files navigation

LLM Reliability & Consistency

Reliability Papers

Consistency Papers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages