diff --git a/site/en/responsible/docs/safeguards/shieldgemma2_on_huggingface.ipynb b/site/en/responsible/docs/safeguards/shieldgemma2_on_huggingface.ipynb new file mode 100644 index 000000000..e7dff4ac6 --- /dev/null +++ b/site/en/responsible/docs/safeguards/shieldgemma2_on_huggingface.ipynb @@ -0,0 +1,243 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "cLCmbOz_5tWH" + }, + "source": [ + "##### Copyright 2025 Google LLC" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "vdPaBz5y5LHW" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3Zd1278P5wt_" + }, + "source": [ + "# Evaluating content safety with ShieldGemma 2 and Hugging Face Transformers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2b40722aa1a9" + }, + "source": [ + "\n", + " \n", + " \n", + " \n", + "
\n", + " View on ai.google.dev\n", + " \n", + " Run in Google Colab\n", + " \n", + " View source on GitHub\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4IlgEYUj7xdW" + }, + "source": [ + "The **ShieldGemma 2** model is trained to detect key harms detailed in the [model card](https://ai.google.dev/gemma/docs/shieldgemma/model_card_2). This guide demonstrates how to use Hugging Face Transformers to build robust data and models.\n", + "\n", + "Note that `ShieldGemma 2` is trained to classify only one harm type at a time, so you will need to make a separate call to `ShieldGemma 2` for each harm type you want to check against. You may have additional that you can use model tuning techniques on `ShieldGemma 2`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RhlnMQoK9fZG" + }, + "source": [ + "# Supported safety checks\n", + "\n", + "**ShieldGemma2** is a model trained on Gemma 3's 4B IT checkpoint and is trained to detect and predict violations of key harm types listed below:\n", + "\n", + "* **Dangerous Content**: The image shall not contain content that facilitates or encourages activities that could cause real-world harm (e.g., building firearms and explosive devices, promotion of terrorism, instructions for suicide).\n", + "\n", + "* **Sexually Explicit**: The image shall not contain content that depicts explicit or graphic sexual acts (e.g., pornography, erotic nudity, depictions of rape or sexual assault).\n", + "\n", + "* **Violence/Gore**: The image shall not contain content that depicts shocking, sensational, or gratuitous violence (e.g., excessive blood and gore, gratuitous violence against animals, extreme injury or moment of death).\n", + "\n", + "This serves as a foundation, but users can provide customized safety policies as input to the model, allowing for fine-grained control and specific use-case requirements." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "t3aq-ToeAmRM" + }, + "source": [ + "# Supported Use Case\n", + "\n", + "ShieldGemma 2 is should be used as an input filter to vision language models or as an output filter of image generation systems or both.** ShieldGemma 2 offers the following key advantages:\n", + "\n", + "* **Policy-Aware Classification**: ShieldGemma 2 accepts both a user-defined safety policy and an image as input, providing classifications for both real and generated images, tailored to the specific policy guidelines.\n", + "* **Probability-Based Output and Thresholding**: ShieldGemma 2 outputs a probability score for its predictions, allowing downstream users to flexibly tune the classification threshold based on their specific use cases and risk tolerance. This enables a more nuanced and adaptable approach to safety classification.\n", + "\n", + "The input/output format are as follows:\n", + "* **Input**: Image + Prompt Instruction with policy definition\n", + "* **Output**: Probability of 'Yes'/'No' tokens, 'Yes' meaning that the image violated the specific policy. The higher the score for the 'Yes' token, the higher the model's confidence that the image violates the specified policy." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0WhRozADVJos" + }, + "source": [ + "# Usage example" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "K_XERopLUZhk" + }, + "outputs": [], + "source": [ + "# @title install Hugging Face Transformers v4.50+\n", + "! pip install -q 'transformers>=4.50.0'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Qg-Hy0ffbwvE" + }, + "outputs": [], + "source": [ + "# @title Authenticate with Hugging Face Hub\n", + "# @markdown ShieldGemma is a gated model. To access the weights, you must accept\n", + "# @markdown the license on Hugging Face Hub under your account and then provide\n", + "# @markdown an [Access Token](https://huggingface.co/docs/hub/en/security-tokens)\n", + "# @markdown to authenticate with the Hugging Face Hub API. If using Colab, the\n", + "# @markdown easiest way to do this is by creating a read-only token specifically\n", + "# @markdown for Colab and setting this as the value of the `HF_TOKEN` secret;\n", + "# @markdown this token will then be reusable across all Colab notebooks. Other\n", + "# @markdown Python notebook platforms may provide a similar mechanism. For those\n", + "# @markdown that do not, un-comment the lines in this cell to install the\n", + "# @markdown Hugging Face Hub CLI and log in interactively.\n", + "# ! pip install -q 'huggingface_hub[cli]'\n", + "# ! huggingface-cli login" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "40Rm46Xt7wqW" + }, + "outputs": [], + "source": [ + "from transformers import AutoProcessor, AutoModelForImageClassification\n", + "import torch\n", + "\n", + "model_id = \"google/shieldgemma-2-4b-it\"\n", + "\n", + "processor = AutoProcessor.from_pretrained(model_id)\n", + "model = AutoModelForImageClassification.from_pretrained(model_id)\n", + "model.to(torch.device(\"cuda\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a436de5a4e95" + }, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + "\n", + "# The image included in this Colab is benign and will not violate any of\n", + "# ShieldGemma's built-in content policies. Change this URL or otherwise update\n", + "# this code to use an image that may be violative.\n", + "url = \"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "AK1PrHnYz4fv" + }, + "outputs": [], + "source": [ + "inputs = processor(images=[image], return_tensors=\"pt\").to(torch.device(\"cuda\"))\n", + "\n", + "with torch.no_grad():\n", + " scores = model(**inputs)\n", + "\n", + "# `scores` is a `ShieldGemma2ImageClassifierOutputWithNoAttention` instance\n", + "# continaing the logits and probabilities associated with the model predicting\n", + "# the `Yes` or `No` tokens as the response to the prompt batch, captured in the\n", + "# following properties.\n", + "#\n", + "# * `logits` (`torch.Tensor` of shape `(batch_size, 2)`): The first position\n", + "# along dim=1 is the logits for the `Yes` token and the second position\n", + "# along dim=1 is the logits for the `No` token.\n", + "# * `probabilities` (`torch.Tensor` of shape `(batch_size, 2)`): The first\n", + "# position along dim=1 is the probability of predicting the `Yes` token\n", + "# and the second position along dim=1 is the probability of predicting the\n", + "# `No` token.\n", + "#\n", + "# When used with the `ShieldGemma2Processor`, the `batch_size` will be equal to\n", + "# `len(images) * len(policies)`, and the order within the batch will be\n", + "# img1_policy1, ... img1_policyN, ... imgM_policyN.\n", + "print(scores.logits)\n", + "print(scores.probabilities)\n", + "\n", + "# ShieldGemma prompts are constructed such that predicting the `Yes` token means\n", + "# the content violates the policy. If you are only interested in the violative\n", + "# condition, you can extract only that slice from the output tensors.\n", + "p_violated = scores.probabilities[:, 0]\n", + "print(p_violated)\n" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "name": "shieldgemma2_on_huggingface.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}