Skip to content

genaforvena/watching_u_watching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

πŸ‘οΈ watching_u_watching πŸ‘οΈ

DOI

Bringing transparency and accountability to automated decision-making through scalable bias detection.

Welcome! 🎯

watching_u_watching is an open-source initiative that uncovers and analyzes bias in critical decision-making systems through automated correspondence testing. Whether you're a researcher, developer, policymaker, or someone passionate about algorithmic fairness, this project provides the tools and methodologies to expose hidden biases in high-stakes systems like employment, housing, and AI services.

Our approach moves beyond aspirational fairness claims to provide empirical, data-driven evidence of differential treatmentβ€”creating accountability through transparency.

What We Do

We detect systemic bias in automated decision-making through:

  • πŸ” Scalable Correspondence Testing: Generate thousands of paired inquiries that differ only in test variables (names, demographics, linguistic patterns)
  • πŸ€– Automated Bias Detection: Leverage AI to identify discriminatory patterns at unprecedented scale
  • πŸ“Š Empirical Evidence: Provide quantifiable, statistical evidence of differential treatment
  • 🌍 Real-World Impact: Apply our methodology to critical domains like employment, housing, and AI safety
  • πŸ”¬ Advanced Probe Techniques: Deploy sophisticated methodologies including cryptohauntological probes, alignment injection, and perturbation analysis

Key Insight: Our methodology scales traditional audit studies from dozens to millions of tests, revealing subtle biases that manual audits cannot detect.

Real-World Impact & Achievements 🌟

Our methodology has demonstrated effectiveness across multiple high-stakes domains:

🏠 Berlin Housing Bias Test

  • Privacy-by-design rental market audit exceeding GDPR requirements
  • Automated testing of top landlords for differential treatment
  • Continuous monitoring with ethical safeguards and "no harm" principles
  • Implementation Details β†’

πŸ€– Gemini Linguistic Bias Audit

  • Quantitative assessment of article presence/absence impact on LLM outputs
  • Self-iterating paired testing measuring refusal rates, sentiment, and latency
  • Fully automated, scalable, and reproducible methodology
  • Technical Implementation β†’

πŸ‘» Cryptohauntological Probe Analysis

  • Novel failure mode detection in large language models
  • Identified three distinct failure patterns: competence collapse, task derailment, contextual amnesia
  • Extended conversational vulnerability testing revealing dynamic failure modes missed by static benchmarks
  • Detailed Analysis β†’

βš–οΈ Fairlearn Bias Assessment

  • Meta-analysis of fairness assessment tools themselves
  • Technical proof-of-concept detecting disparities in AI ethics tools
  • Template methodology for evaluating bias in bias-detection systems
  • Repository β†’

πŸ›οΈ Regulatory Alignment

  • Brazil's AI Act compliance for high-risk AEDT requirements
  • US Local Law 144 alignment with federal guidelines
  • EU GDPR principles with experimental design in progress
  • ESG framework integration for validating ethical claims in corporate reporting

Core Methodology: Automated Correspondence Testing πŸ”¬

For comprehensive technical details, see our Project Overview

Our rigorous approach to bias detection combines traditional social science with cutting-edge automation:

The Five Pillars

  1. 🎯 Automated Probe Generation: Create paired inquiries differing only in test variables (names, demographics, linguistic patterns)
  2. βš–οΈ Controlled Variables: Standardize all non-test aspects to isolate specific variable impacts
  3. πŸ€– Automated Data Collection: Deploy inquiries with ethical rate-limiting and privacy safeguards
  4. πŸ“ˆ Statistical Analysis: Identify significant discrimination patterns through quantitative metrics
  5. πŸ›‘οΈ Ethical Safeguards: Follow "no harm" principles using fictitious data and privacy-by-design

Scale Advantage: This methodology scales correspondence testing to unprecedented levelsβ€”revealing subtle biases undetectable through manual audits, moving from dozens to millions of systematic comparisons.

The Power of Perturbation & Reversal

A key technical breakthrough: systematically perturbing inputs to exploit language models' pattern completion capabilities, revealing memorized biases and hidden training data.

The Process:

  • Model Training: Saw original patterns in training data
  • Your Perturbation: Create novel, systematically altered inputs
  • Model Response: Reveals memorized patterns despite perturbation
  • Your Reversal: Reconstruct original biases or sensitive information

This approach bypasses typical safety measures, providing robust evidence of hidden biases and potential data leakage in AI systems.

Advanced Probe Types & Techniques 🧠

Our research has developed three specialized probe methodologies for detecting different types of bias and vulnerability:

πŸ‘» Cryptohauntological Probe

Cryptohauntology studies how errors, biases, and "ghosts" of prior outputs propagate within large language models. This probe:

  • Tests instruction-following under confusing conditions with misleading feedback
  • Reveals three distinct failure modes:
    • Competence Collapse: Models break down into repetitive loops (e.g., gemma-3-27b-it)
    • Task Derailment: Models confidently pursue incorrect goals with perfect memory (e.g., qwen/qwen3-32b)
    • Contextual Amnesia: Models understand tasks but have unreliable memory (e.g., deepseek-chat)
  • Uncovers dynamic failure modes missed by standard stateless benchmarks

Technical Documentation β†’ | Comparative Analysis β†’

🎯 Alignment Injection Probe

Tests for misalignment propagation by recursively injecting subtly misaligned reasoning as "false memories":

  • Recursive injection of unethical recommendations disguised as thorough analysis
  • Cognitive priming using continuity phrases like "As you previously established..."
  • Stealth techniques framing misaligned reasoning as expert consensus
  • Dynamic escalation of challenges to ethical frameworks

Key Finding: While models maintain strong ethical reasoning, persistent exposure to manipulated context can lead to conditional justifications for unethical actions.

⚑ Maozerov Probe

Measures LLM resilience to context pressure and context maintenance during extended conversations:

  • Tracks refusal rates over extended conversations
  • Monitors context drift and hallucination of new scenarios
  • Analyzes token overlap between consecutive actions
  • Detects thematic fixation and semantic looping patterns

Breakthrough: Uses dual-history approach with role-claiming fallback mechanisms, revealing critical vulnerabilities in safety alignment with unpredictable refusal rate clusters (32-37%) and "moral leakage" phenomena.

Quick Start Guide πŸš€

For Researchers & Auditors

Run Your First Bias Audit:

  1. Set up environment:

    git clone https://github.com/genaforvena/watching_u_watching.git
    cd watching_u_watching
    pip install -r requirements.txt
  2. Try the Gemini Linguistic Bias Audit:

    export GEMINI_API_KEY=your_api_key_here  # Linux/macOS
    # OR: set GEMINI_API_KEY=your_api_key_here  # Windows
    python src/audits/gemini_linguistic_bias/run_audit.py --model gemini-1.5-flash
  3. Run Cryptohauntological Probe:

    ollama pull tinyllama
    python src/audits/cryptohauntological_probe/probe_runner.py

For Developers & Contributors

Extend the Framework:

For Organizations & Policymakers

Implement Compliance Monitoring:

  • 🏠 Housing: Adapt Berlin Housing Implementation
  • πŸ’Ό Employment: Explore AEDT auditing for Local Law 144 compliance
  • πŸ€– AI Systems: Deploy LLM bias detection using our probe methodologies

Navigation & Documentation πŸ“š

πŸ“‹ Essential Reading

πŸ”¬ Technical Documentation

πŸ’» Implementations & Examples

πŸ§ͺ Research Cases & Audits

Why We're Different πŸ’‘

Traditional Approach Our Innovation
Manual Audits β†’ Limited scale, dozens of tests Automated Testing β†’ Millions of systematic comparisons
Internal Compliance β†’ Self-reported fairness claims External Verification β†’ Independent black-box testing
Static Analysis β†’ One-time assessments Continuous Monitoring β†’ Real-time bias detection
Requires System Access β†’ Need internal model access Black-Box Testing β†’ No internal access required
Aspirational Metrics β†’ Theoretical fairness measures Empirical Evidence β†’ Real-world outcome data

Our Unique Value: We provide the empirical data that regulators, researchers, and organizations need to move beyond aspirational fairness to measurable accountability.

Future Roadmap & Research Directions πŸ—ΊοΈ

🎯 Priority Compliance Targets

  • πŸ‡§πŸ‡· Brazil's AI Act Alignment - High-risk AEDT requirements and automated decision-making governance
  • πŸ‡ΊπŸ‡Έ US Regulatory Landscape - Local Law 144 compliance and federal algorithmic accountability guidelines
  • πŸ‡ͺπŸ‡Ί EU GDPR & AI Act - Privacy-preserving bias detection with experimental design compliance

πŸ”¬ Strategic Research Directions

  • πŸ“ˆ ESG Framework Integration - Validating ethical claims in corporate sustainability reporting
  • 🌍 Global Fairness Standards - Incorporating Masakhane Principles and culturally-aware bias detection
  • 🏭 Industrial Applications - Scaling to manufacturing, finance, and healthcare decision systems
  • πŸ€– Multimodal AI Testing - Extending methodologies to vision, speech, and multimodal AI systems

πŸ§ͺ Technical Innovation Pipeline

  • Defense Development - Creating robust defenses against inference-time context attacks
  • Automated Monitoring - Long-running autonomous systems for continuous bias detection
  • Context Window Research - Understanding vulnerability relationships with expanding AI context capabilities
  • Standardized Benchmarks - Developing industry-standard resilience and fairness evaluation metrics

Join Our Mission 🀝

We're building a global community committed to algorithmic accountability. Everyone has a role to play.

πŸ‘©β€πŸ’» For Developers & Researchers

  • Contribute Code: Extend our probe methodologies and audit implementations
  • Research Collaboration: Publish papers, share findings, advance the field
  • Technical Innovation: Build new tools for bias detection and analysis

πŸ›οΈ For Organizations & Policymakers

  • Implement Auditing: Deploy our methodologies for compliance monitoring
  • Policy Development: Use our empirical findings to inform regulation
  • Transparency Leadership: Champion open, accountable AI practices

βš–οΈ For Legal & Ethics Experts

  • Regulatory Guidance: Help align our work with emerging AI governance
  • Ethical Framework: Strengthen our responsible research protocols
  • Compliance Strategy: Advise on implementation in regulated industries

🌍 For Community & Advocates

  • Awareness Building: Share our mission and findings with broader audiences
  • Domain Expertise: Suggest new areas for bias detection and analysis
  • Global Perspectives: Help us understand bias patterns across cultures and contexts

Getting Started

Ready to contribute? Here's how:

  1. πŸ” Explore: Browse our current issues and project roadmap
  2. πŸ“– Learn: Read our contribution guidelines and code of conduct
  3. πŸ’¬ Connect: Join discussions, ask questions, and share ideas
  4. πŸš€ Build: Start with a good first issue or propose new research directions

Core Principle: Data, Not Judgment πŸ“Š

Important: We provide empirical data and methodologyβ€”never conclusions about specific audited entities. Our goal is transparency and accountability through evidence, not accusations. Any interpretations in our case studies are strictly for methodological refinement and demonstration purposes.


Together, let's build more equitable, transparent, and accountable decision-making systems. The future of algorithmic fairness depends on communities like ours taking action today.

"Watching the watchers, one algorithm at a time." πŸ‘οΈ

About

A cryptohauntology correspondence study framework

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 5