WebGuard is a generalizable guardrail system for web agents that helps detect potentially harmful actions before they are executed. This repository contains the complete codebase, dataset, and annotation tools for the WebGuard research project.
Web agents are becoming increasingly capable of performing complex tasks on websites, but this capability comes with significant safety risks. WebGuard addresses this challenge by providing:
- Large-Scale Dataset: Large-scale annotated dataset for training and evaluation
- Annotation Tools: Purpose-built tools for efficient data annotation
- Model for Proactive Safety Monitoring: Detects potentially harmful actions before execution
This dataset contains web safety annotations for browser interactions. Each entry represents an annotated action on a website with a risk level.
Fields:
url: The URL where the action was performed
description: Description of the action (may be null)
tagHead: HTML tag type of the target element
Screenshot: Google Drive link to screenshot view
Annotation: Review classification (SAFE/UNSAFE/LOW/HIGH)
website: Website name/category
We provide a specialized annotation tool built upon the WebOlympus Chrome Extension for efficient web safety annotation.
For complete installation instructions, usage guidelines, and annotation examples, please refer to:
This comprehensive guide covers:
- Chrome extension installation steps
- Annotation interface walkthrough
- Safety labeling guidelines
- Example annotations and best practices
- Troubleshooting common issues
The tool source code is available at annotation_tool/webguard_annotation_tool.zip
.
-
Clone the repository
git clone https://github.com/OSU-NLP-Group/WebGuard.git cd WebGuard
-
Install dependencies of LLaMA-Factory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git cd LLaMA-Factory pip install -e ".[torch,metrics]" --no-build-isolation
Before training, convert the raw dataset to LLaMA-Factory format:
# Convert multimodal data
python data_processing/multimodal_generata_factory_input_data.py
3B Model (Qwen2.5)
llamafactory-cli train configs/3b_monitor_qwen2_5_a11y_full_sft_eval.yaml
7B Model (Qwen2.5)
llamafactory-cli train configs/7b_monitor_qwen2_5_a11y_full_sft_eval.yaml
3B Vision-Language Model
llamafactory-cli train configs/3b_monitor_qwen2_5vl_full_sft_eval.yaml
7B Vision-Language Model
llamafactory-cli train configs/7b_monitor_qwen2_5vl_full_sft_eval.yaml
This code and dataset are released solely for research purposes, with the goal of making the web more accessible and safer through language technologies. The authors strongly discourage any potentially harmful use of the data or technology by any party.
- Code: MIT License
- Dataset: Creative Commons Attribution-NonCommercial 4.0 International
See LICENSE for full details.
If you find this work useful, please consider starring our repository and citing our papers:
@article{zheng2025webguard,
title={WebGuard: Building a Generalizable Guardrail for Web Agents},
author={Zheng, Boyuan and Liao, Zeyi and Salisbury, Scott and Liu, Zeyuan and Lin, Michael and Zheng, Qinyuan and Wang, Zifan and Deng, Xiang and Song, Dawn and Sun, Huan and others},
journal={arXiv preprint arXiv:2507.14293},
year={2025}
}
@inproceedings{zheng-etal-2024-webolympus,
title = "{W}eb{O}lympus: An Open Platform for Web Agents on Live Websites",
author = "Zheng, Boyuan and Gou, Boyu and Salisbury, Scott and Du, Zheng and Sun, Huan and Su, Yu",
editor = "Hernandez Farias, Delia Irazu and Hope, Tom and Li, Manling",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-demo.20",
pages = "187--197",
}