humancompatible.detect

humancompatible.detect is an open-source toolkit for detecting bias in AI models and their training data.

AI Fairness

In a fairness auditing, one would generally like to know if two distributions are identical. These distributions could be a distribution of internal private training data and publicly accessible data from a nation-wide census, i.e., a good baseline. Or one can compare samples classified positively and negatively, to see if groups are represented equally in each class.

In other words, we ask

Is there some combination of protected attributes (race × age × …) for which people are treated noticeably differently?

Samples belonging to a given combination of protected attributes is called a subgroup.

Using HumanCompatible.Detect

Install the library:

python -m pip install git+https://github.com/humancompatible/detect.git

Compute the bias (MSD in this case):

from humancompatible.detect import detect_bias_csv

# toy example
# (col 1 = Race, col 2 = Age, col 3 = (binary) target)
msd, rule_idx = detect_bias_csv(
    csv_path = csv,
    target = "Target",
    protected_list = ["Race", "Age"],
    method = "MSD",
)

More to explore

examples/01_usage.ipynb – a 5-minute notebook reproducing the call above, then translating rule_idx back to human-readable conditions.

Feel free to start with the light notebook, then dive into the experiments with different datasets.

We also provide documentation. For more details on installation, see Installation details.

Methods

Maximum Subgroup Discrepancy (MSD)

MSD is the subgroup maximal difference in probability mass of a given subgroup, comparing the mass given by each distribution.

Naturally, two distributions are fair iff all sub-groups have similar mass.
The arg max immediately tells you which group is most disadvantaged as an interpretable attribute-value combination.
MSD has linear sample complexity, a stark contrast to exponential complexity of other distributional distances (Wasserstein, TV...)

Installation details

Requirements

Requirements are included in the requirements.txt file. They include:

Python ≥ 3.10
A MILP solver (to solve the mixed-integer program in the case of MSD)
- The default solver is HiGHS. This is an open-source solver included in the requirements.
- A faster, but proprietary solver Gurobi can also easily be used. Free academic licences are available. This solver was used in the original paper.
- We use Pyomo for modelling. This allows for multiple solvers, see the lists of solver interfaces and persistent solver interfaces. Note that the implementation sets the graceful time limit only for solvers Gurobi, Cplex, HiGHS, Xpress, and GLPK.

(Optional) create a fresh environment

python -m venv .venv
# ── Activate it ─────────────────────────────────────────────
# Linux / macOS
source .venv/bin/activate
# Windows – cmd.exe
.venv\Scripts\activate.bat
# Windows – PowerShell
.venv\Scripts\Activate.ps1

Install the package

Before we complete the PyPI release you can install the latest snapshot straight from GitHub in one line:

python -m pip install git+https://github.com/humancompatible/detect.git

If you prefer an editable (developer) install:

git clone https://github.com/humancompatible/detect.git
cd detect
python -m pip install -r requirements.txt
python -m pip install -e .

Verify it worked

python -c "from humancompatible.detect.MSD import compute_MSD; print('MSD imported OK')"

If the import fails you’ll see:

ModuleNotFoundError: No module named 'humancompatible'

Why classical distances fail

Distance	Needs to look at	Worst-case samples	Drawback
Wasserstein, Total Variation, MMD, …	full d-dimensional joint	Ω(2^d)	exponential sample cost, no group explanation
MSD (ours)	only the protected marginal	O(d)	exact group, human-readable

MSD’s linear sample complexity is proven in the paper and achieved in practice via an exact Mixed-Integer Optimisation that scans the doubly-exponential search space implicitly, returning both the metric value and the rule that realises it.

References

If you use the MSD in your work, please cite the following work:

@inproceedings{MSD,
  author = {Jiří Němeček and Mark Kozdoba and Illia Kryvoviaz and Tomáš Pevný and Jakub Mareček},
  title = {Bias Detection via Maximum Subgroup Discrepancy},
  year = {2025},
  booktitle = {Proceedings of the 31st ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  series = {KDD '25}
}

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
.github		.github
data		data
docs		docs
examples		examples
humancompatible		humancompatible
images		images
results_dir		results_dir
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

humancompatible.detect

AI Fairness

Using HumanCompatible.Detect

More to explore

Methods

Maximum Subgroup Discrepancy (MSD)

Installation details

Requirements

(Optional) create a fresh environment

Install the package

Verify it worked

Why classical distances fail

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

humancompatible/detect

Folders and files

Latest commit

History

Repository files navigation

humancompatible.detect

AI Fairness

Using HumanCompatible.Detect

More to explore

Methods

Maximum Subgroup Discrepancy (MSD)

Installation details

Requirements

(Optional) create a fresh environment

Install the package

Verify it worked

Why classical distances fail

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages