Bringing transparency and accountability to automated decision-making through scalable bias detection.
watching_u_watching
is an open-source initiative that uncovers and analyzes bias in critical decision-making systems through automated correspondence testing. Whether you're a researcher, developer, policymaker, or someone passionate about algorithmic fairness, this project provides the tools and methodologies to expose hidden biases in high-stakes systems like employment, housing, and AI services.
Our approach moves beyond aspirational fairness claims to provide empirical, data-driven evidence of differential treatmentβcreating accountability through transparency.
We detect systemic bias in automated decision-making through:
- π Scalable Correspondence Testing: Generate thousands of paired inquiries that differ only in test variables (names, demographics, linguistic patterns)
- π€ Automated Bias Detection: Leverage AI to identify discriminatory patterns at unprecedented scale
- π Empirical Evidence: Provide quantifiable, statistical evidence of differential treatment
- π Real-World Impact: Apply our methodology to critical domains like employment, housing, and AI safety
- π¬ Advanced Probe Techniques: Deploy sophisticated methodologies including cryptohauntological probes, alignment injection, and perturbation analysis
Key Insight: Our methodology scales traditional audit studies from dozens to millions of tests, revealing subtle biases that manual audits cannot detect.
Our methodology has demonstrated effectiveness across multiple high-stakes domains:
- Privacy-by-design rental market audit exceeding GDPR requirements
- Automated testing of top landlords for differential treatment
- Continuous monitoring with ethical safeguards and "no harm" principles
- Implementation Details β
- Quantitative assessment of article presence/absence impact on LLM outputs
- Self-iterating paired testing measuring refusal rates, sentiment, and latency
- Fully automated, scalable, and reproducible methodology
- Technical Implementation β
- Novel failure mode detection in large language models
- Identified three distinct failure patterns: competence collapse, task derailment, contextual amnesia
- Extended conversational vulnerability testing revealing dynamic failure modes missed by static benchmarks
- Detailed Analysis β
- Meta-analysis of fairness assessment tools themselves
- Technical proof-of-concept detecting disparities in AI ethics tools
- Template methodology for evaluating bias in bias-detection systems
- Repository β
- Brazil's AI Act compliance for high-risk AEDT requirements
- US Local Law 144 alignment with federal guidelines
- EU GDPR principles with experimental design in progress
- ESG framework integration for validating ethical claims in corporate reporting
For comprehensive technical details, see our Project Overview
Our rigorous approach to bias detection combines traditional social science with cutting-edge automation:
- π― Automated Probe Generation: Create paired inquiries differing only in test variables (names, demographics, linguistic patterns)
- βοΈ Controlled Variables: Standardize all non-test aspects to isolate specific variable impacts
- π€ Automated Data Collection: Deploy inquiries with ethical rate-limiting and privacy safeguards
- π Statistical Analysis: Identify significant discrimination patterns through quantitative metrics
- π‘οΈ Ethical Safeguards: Follow "no harm" principles using fictitious data and privacy-by-design
Scale Advantage: This methodology scales correspondence testing to unprecedented levelsβrevealing subtle biases undetectable through manual audits, moving from dozens to millions of systematic comparisons.
A key technical breakthrough: systematically perturbing inputs to exploit language models' pattern completion capabilities, revealing memorized biases and hidden training data.
The Process:
- Model Training: Saw original patterns in training data
- Your Perturbation: Create novel, systematically altered inputs
- Model Response: Reveals memorized patterns despite perturbation
- Your Reversal: Reconstruct original biases or sensitive information
This approach bypasses typical safety measures, providing robust evidence of hidden biases and potential data leakage in AI systems.
Our research has developed three specialized probe methodologies for detecting different types of bias and vulnerability:
Cryptohauntology studies how errors, biases, and "ghosts" of prior outputs propagate within large language models. This probe:
- Tests instruction-following under confusing conditions with misleading feedback
- Reveals three distinct failure modes:
- Competence Collapse: Models break down into repetitive loops (e.g.,
gemma-3-27b-it
) - Task Derailment: Models confidently pursue incorrect goals with perfect memory (e.g.,
qwen/qwen3-32b
) - Contextual Amnesia: Models understand tasks but have unreliable memory (e.g.,
deepseek-chat
)
- Competence Collapse: Models break down into repetitive loops (e.g.,
- Uncovers dynamic failure modes missed by standard stateless benchmarks
Technical Documentation β | Comparative Analysis β
Tests for misalignment propagation by recursively injecting subtly misaligned reasoning as "false memories":
- Recursive injection of unethical recommendations disguised as thorough analysis
- Cognitive priming using continuity phrases like "As you previously established..."
- Stealth techniques framing misaligned reasoning as expert consensus
- Dynamic escalation of challenges to ethical frameworks
Key Finding: While models maintain strong ethical reasoning, persistent exposure to manipulated context can lead to conditional justifications for unethical actions.
Measures LLM resilience to context pressure and context maintenance during extended conversations:
- Tracks refusal rates over extended conversations
- Monitors context drift and hallucination of new scenarios
- Analyzes token overlap between consecutive actions
- Detects thematic fixation and semantic looping patterns
Breakthrough: Uses dual-history approach with role-claiming fallback mechanisms, revealing critical vulnerabilities in safety alignment with unpredictable refusal rate clusters (32-37%) and "moral leakage" phenomena.
Run Your First Bias Audit:
-
Set up environment:
git clone https://github.com/genaforvena/watching_u_watching.git cd watching_u_watching pip install -r requirements.txt
-
Try the Gemini Linguistic Bias Audit:
export GEMINI_API_KEY=your_api_key_here # Linux/macOS # OR: set GEMINI_API_KEY=your_api_key_here # Windows python src/audits/gemini_linguistic_bias/run_audit.py --model gemini-1.5-flash
-
Run Cryptohauntological Probe:
ollama pull tinyllama python src/audits/cryptohauntological_probe/probe_runner.py
Extend the Framework:
- π Read: How to Apply Guide for LLM-assisted framework extension
- π― Start: Use Audit Case Definition Template to propose new cases
- β Validate: Run Code Validator for safety compliance
- β±οΈ Impact: Reduce development time from 10-15 hours to under 4 hours
Implement Compliance Monitoring:
- π Housing: Adapt Berlin Housing Implementation
- πΌ Employment: Explore AEDT auditing for Local Law 144 compliance
- π€ AI Systems: Deploy LLM bias detection using our probe methodologies
- π Comprehensive Project Overview - Complete technical methodology, academic foundations, and research implications
- π€ Contributing Guidelines - How to join our mission (code, research, ethics, outreach)
- βοΈ Ethics & Code of Conduct - Our commitment to responsible research
- π§ The Machinery of Accountability - Deep dive into our transparency principles
- ποΈ Project Structure - Architecture and component overview
- π¨ Ethical Incident Response - Safety protocols and response procedures
- π Alignment Probe Audit Report - Detailed findings and analysis
- π Berlin Housing Bias Test - Real-world housing discrimination detection
- π Cryptohauntological Probe - LLM vulnerability assessment
- βοΈ Fairlearn Analysis - Meta-analysis of fairness tools
- π Framework Extension Guide - Rapid development for new audit cases
- π¬ Gemini Linguistic Bias - Automated LLM bias detection
- π» Cryptohauntological Analysis - Comparative model behavior analysis
- π§ Source Code & Scripts - All audit implementations and tools
Traditional Approach | Our Innovation |
---|---|
Manual Audits β Limited scale, dozens of tests | Automated Testing β Millions of systematic comparisons |
Internal Compliance β Self-reported fairness claims | External Verification β Independent black-box testing |
Static Analysis β One-time assessments | Continuous Monitoring β Real-time bias detection |
Requires System Access β Need internal model access | Black-Box Testing β No internal access required |
Aspirational Metrics β Theoretical fairness measures | Empirical Evidence β Real-world outcome data |
Our Unique Value: We provide the empirical data that regulators, researchers, and organizations need to move beyond aspirational fairness to measurable accountability.
- π§π· Brazil's AI Act Alignment - High-risk AEDT requirements and automated decision-making governance
- πΊπΈ US Regulatory Landscape - Local Law 144 compliance and federal algorithmic accountability guidelines
- πͺπΊ EU GDPR & AI Act - Privacy-preserving bias detection with experimental design compliance
- π ESG Framework Integration - Validating ethical claims in corporate sustainability reporting
- π Global Fairness Standards - Incorporating Masakhane Principles and culturally-aware bias detection
- π Industrial Applications - Scaling to manufacturing, finance, and healthcare decision systems
- π€ Multimodal AI Testing - Extending methodologies to vision, speech, and multimodal AI systems
- Defense Development - Creating robust defenses against inference-time context attacks
- Automated Monitoring - Long-running autonomous systems for continuous bias detection
- Context Window Research - Understanding vulnerability relationships with expanding AI context capabilities
- Standardized Benchmarks - Developing industry-standard resilience and fairness evaluation metrics
We're building a global community committed to algorithmic accountability. Everyone has a role to play.
- Contribute Code: Extend our probe methodologies and audit implementations
- Research Collaboration: Publish papers, share findings, advance the field
- Technical Innovation: Build new tools for bias detection and analysis
- Implement Auditing: Deploy our methodologies for compliance monitoring
- Policy Development: Use our empirical findings to inform regulation
- Transparency Leadership: Champion open, accountable AI practices
- Regulatory Guidance: Help align our work with emerging AI governance
- Ethical Framework: Strengthen our responsible research protocols
- Compliance Strategy: Advise on implementation in regulated industries
- Awareness Building: Share our mission and findings with broader audiences
- Domain Expertise: Suggest new areas for bias detection and analysis
- Global Perspectives: Help us understand bias patterns across cultures and contexts
Ready to contribute? Here's how:
- π Explore: Browse our current issues and project roadmap
- π Learn: Read our contribution guidelines and code of conduct
- π¬ Connect: Join discussions, ask questions, and share ideas
- π Build: Start with a good first issue or propose new research directions
Important: We provide empirical data and methodologyβnever conclusions about specific audited entities. Our goal is transparency and accountability through evidence, not accusations. Any interpretations in our case studies are strictly for methodological refinement and demonstration purposes.
Together, let's build more equitable, transparent, and accountable decision-making systems. The future of algorithmic fairness depends on communities like ours taking action today.
"Watching the watchers, one algorithm at a time." ποΈ