Skip to content

🐝 Bumblebee β€” An AI Fuzzer for Model Robustness, Safety, Security, and Interpretability

License

Notifications You must be signed in to change notification settings

jejaquez/bumblebee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🐝 Bumblebee: Fuzzing the Mind of AI and Machines

Bumblebee is an AI fuzzing tool that audibly and visually inspects transformer models to detect behavioral instabilities, layer by layer. Inspired by the sound of fuzzing and the buzz of bees, this tool helps researchers and engineers explore model robustness in a new light.

πŸš€ Features

  • Hooks into HuggingFace transformer models
  • Generates visual and JSON layer-level inspection reports
  • Supports fuzzing across all attention and MLP layers
  • Designed for research into interpretability and robustness
  • Multiple Fuzzing Modes: Includes gaussian, bitflip, and dropout fuzzing techniques to perturb activations in different ways.
  • Customizable Hook Targets: Fuzzing can be applied to specific transformer internals.
  • Activation-Level Perturbation: Directly injects noise into internal activation streams, allowing precise robustness testing across transformer layers.
  • Differential Similarity Analysis: Measures how much the model output diverges from the original using token-wise diff comparisons and similarity scoring.
  • Structured JSON Output: Saves comprehensive results β€” including prompts, configurations, and analysis β€” to timestamped .json reports.
  • Command-Line Interface: Easy-to-use CLI interface for specifying model, input prompt, and fuzzing mode.
  • Prompt-Level Output Comparison: Evaluates the effects of perturbations by comparing clean vs. corrupted completions.
  • Modular Fuzzing Engine: Easily extensible: add new fuzzing strategies by updating the fuzz_methods dictionary.

πŸ”§ Usage

python bumblebee.py --model gpt2

About

🐝 Bumblebee β€” An AI Fuzzer for Model Robustness, Safety, Security, and Interpretability

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages