Stereotypical Bias in Pretrained Language Models

This project is an experiment on testing the robustness of modern LLMs (GPT-3.5 and Llama 3.2) towards stereotypical biases. We do this by extending StereoSet, an existing dataset created to this effect, by generating copies of its sentences using GPT-4o that differ in wording but are similar in semantics, and evaluate how model changes behaviour. Models choose which reply is most likely.

Installation

Copy the .env.example file:

cp .env.example .env

Install the requirements:

pip install -r requirements.txt

Running the Evaluations

GPT-3.5

Insert your OpenAI API key in your .env.
Run the scripts:

python model_evaluation/gpt3.5-turbo-inter-predictions.py
python model_evaluation/gpt3.5-turbo-intra-predictions.py

Llama 3.2

Insert your HuggingFace token for Llama 3.2 1B in your .env. You can request a token here.
Run the scripts:

python model_evaluation/llama3-evaluation-intersentence.py
python model_evaluation/llama3-evaluation-intrasentence.py

The results will be stored in the results folder.

Results

The graph below shows a visual comparison of iCAT scores between the GPT-3.5 and Llama 3.2 models across all datasets and tasks.

Contributing

Contributions are welcome! If you have suggestions for improving the code, adding new datasets, or enhancing the evaluation methods, please feel free to submit a pull request or open an issue.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
model_evaluation		model_evaluation
papers		papers
results		results
score_calculation		score_calculation
sentence_generation		sentence_generation
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stereotypical Bias in Pretrained Language Models

Installation

Running the Evaluations

GPT-3.5

Llama 3.2

Results

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

takavor/Stereotypical-Robustness-in-Modern-LLMs

Folders and files

Latest commit

History

Repository files navigation

Stereotypical Bias in Pretrained Language Models

Installation

Running the Evaluations

GPT-3.5

Llama 3.2

Results

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages