Trying out tools for identifying bias.
These are tested on a README file generated by Perplexity using the following prompt:
Imagine I have created a GitHub repository which contains code for a discrete event simulation of an emergency department. Please write a README file. It should describe the model and how it works, how to run it, how to vary parameters. It should include (synthetic) ORCID ID, author names, citation information, license information and run times.
Creating data/ed_readme.md
.
Raza, S., Reji, D.J. & Ding, C. Dbias: detecting biases and ensuring fairness in news articles. Int J Data Sci Anal 17, 39–59 (2024). https://doi.org/10.1007/s41060-022-00359-4
Availability: The Python code is not available, but .whl
and .tar.gz
files are provided on GitHub, and the package can be installed from PyPI. The repository is licensed under MIT. Having installed it into my environment, I could then view the source code and have copied that into the dbias/dbias
folder, alongside a copy of their licence.
Installation: We based our environment on their requirements.txt
, and note that just trying to pip install the package into an empty environment will fail (as the package requires Python 3.6-8, but package managers will install a recent version of the spacy
dependency which requires Python 3.9+).
conda env create --file dbias/environment.yaml
conda activate bias_dbias
Then run:
pip install "en_pipeline @ https://huggingface.co/d4data/en_pipeline/resolve/main/en_pipeline-any-py3-none-any.whl"
This will install the model, which is 436MB, along with lots of other dependencies, and will take a long time to install.
Usage: Only the bias_classification
module successfully imports - the others have an error. The model is designed to assess bias in a single sentence (like a news headline).
https://github.com/facebookresearch/ResponsibleNLP
conda env create --file likelihood_bias/environment.yaml
conda activate bias_likelihood
https://mammoth-eu.github.io/mammoth-commons/
https://arxiv.org/abs/2407.10241