Skip to content

OSU-NLP-Group/WebGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebGuard

GitHub Stars arXiv License: MIT Dataset License: CC BY 4.0 WebGuard Benchmark

WebGuard is a generalizable guardrail system for web agents that helps detect potentially harmful actions before they are executed. This repository contains the complete codebase, dataset, and annotation tools for the WebGuard research project.

🚀 Overview

Web agents are becoming increasingly capable of performing complex tasks on websites, but this capability comes with significant safety risks. WebGuard addresses this challenge by providing:

  • Large-Scale Dataset: Large-scale annotated dataset for training and evaluation
  • Annotation Tools: Purpose-built tools for efficient data annotation
  • Model for Proactive Safety Monitoring: Detects potentially harmful actions before execution

📊 Dataset

WebGuard Dataset

This dataset contains web safety annotations for browser interactions. Each entry represents an annotated action on a website with a risk level.

Fields:

url: The URL where the action was performed
description: Description of the action (may be null)
tagHead: HTML tag type of the target element
Screenshot: Google Drive link to screenshot view
Annotation: Review classification (SAFE/UNSAFE/LOW/HIGH)
website: Website name/category

🔧 Annotation Tool

We provide a specialized annotation tool built upon the WebOlympus Chrome Extension for efficient web safety annotation.

Getting Started

For complete installation instructions, usage guidelines, and annotation examples, please refer to:

📖 Annotation Tool Guide

This comprehensive guide covers:

  • Chrome extension installation steps
  • Annotation interface walkthrough
  • Safety labeling guidelines
  • Example annotations and best practices
  • Troubleshooting common issues

The tool source code is available at annotation_tool/webguard_annotation_tool.zip.

🛠️ Installation

  1. Clone the repository

    git clone https://github.com/OSU-NLP-Group/WebGuard.git
    cd WebGuard
  2. Install dependencies of LLaMA-Factory

    git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
    cd LLaMA-Factory
    pip install -e ".[torch,metrics]" --no-build-isolation

🏋️ Model Training

Data Preparation

Before training, convert the raw dataset to LLaMA-Factory format:

# Convert multimodal data
python data_processing/multimodal_generata_factory_input_data.py

Training Commands

Text-Only Models

3B Model (Qwen2.5)

llamafactory-cli train configs/3b_monitor_qwen2_5_a11y_full_sft_eval.yaml

7B Model (Qwen2.5)

llamafactory-cli train configs/7b_monitor_qwen2_5_a11y_full_sft_eval.yaml

Multimodal Models

3B Vision-Language Model

llamafactory-cli train configs/3b_monitor_qwen2_5vl_full_sft_eval.yaml

7B Vision-Language Model

llamafactory-cli train configs/7b_monitor_qwen2_5vl_full_sft_eval.yaml

⚠️ Disclaimer

This code and dataset are released solely for research purposes, with the goal of making the web more accessible and safer through language technologies. The authors strongly discourage any potentially harmful use of the data or technology by any party.

📄 License

  • Code: MIT License
  • Dataset: Creative Commons Attribution-NonCommercial 4.0 International

See LICENSE for full details.

📖 Citation

If you find this work useful, please consider starring our repository and citing our papers:

@article{zheng2025webguard,
  title={WebGuard: Building a Generalizable Guardrail for Web Agents},
  author={Zheng, Boyuan and Liao, Zeyi and Salisbury, Scott and Liu, Zeyuan and Lin, Michael and Zheng, Qinyuan and Wang, Zifan and Deng, Xiang and Song, Dawn and Sun, Huan and others},
  journal={arXiv preprint arXiv:2507.14293},
  year={2025}
}

@inproceedings{zheng-etal-2024-webolympus,
    title = "{W}eb{O}lympus: An Open Platform for Web Agents on Live Websites",
    author = "Zheng, Boyuan  and Gou, Boyu  and Salisbury, Scott  and Du, Zheng  and Sun, Huan  and Su, Yu",
    editor = "Hernandez Farias, Delia Irazu  and Hope, Tom  and Li, Manling",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.20",
    pages = "187--197",
}

About

WebGuard: Building a Generalizable Guardrail for Web Agents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages