diff --git a/site/en/gemma/docs/core/huggingface_inference.ipynb b/site/en/gemma/docs/core/huggingface_inference.ipynb
new file mode 100644
index 000000000..72dcb4f3b
--- /dev/null
+++ b/site/en/gemma/docs/core/huggingface_inference.ipynb
@@ -0,0 +1,423 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "G3MMAcssHTML"
+ },
+ "source": [
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Tce3stUlHN0L"
+ },
+ "source": [
+ "##### Copyright 2025 Google LLC."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "cellView": "form",
+ "id": "tuOe1ymfHZPu"
+ },
+ "outputs": [],
+ "source": [
+ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "f9673bd6"
+ },
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qlsJnFBYSrkZ"
+ },
+ "source": [
+ "# Run Gemma with Hugging Face Transformers\n",
+ "\n",
+ "Generating text, summarizing, and analysing content are just some of the tasks you can accomplish with Gemma open models. This tutorial shows you how to get started running Gemma using Hugging Face Transformers using both text and image input to generate text content. The Transformers Python library provides a API for accessing pre-trained generative AI models, including Gemma. For more information, see the [Transformers](https://huggingface.co/docs/transformers/en/index) documentation.\n",
+ "\n",
+ "Note: Gemma 3 and later models support input of both images and text. Earlier versions of Gemma only support text input, except for some variants, including [PaliGemma](https://ai.google.dev/gemma/docs/setup).\n",
+ "\n",
+ "## Setup\n",
+ "\n",
+ "Before starting this tutorial, complete the following steps:\n",
+ "\n",
+ "* Get access to Gemma by logging into [Hugging Face](https://huggingface.co/google/gemma-3-4b-pt) and selecting **Acknowledge license** for a Gemma model.\n",
+ "* Select a Colab runtime with sufficient resources to run\n",
+ " the Gemma model size you want to run. [Learn more](https://ai.google.dev/gemma/docs/core#sizes).\n",
+ "* Generate a Hugging Face [Access Token](https://huggingface.co/docs/hub/en/security-tokens#how-to-manage-user-access-token) and add it to your Colab environment.\n",
+ "\n",
+ "### Configure Access Token\n",
+ "\n",
+ "Add your access token to Colab to enable downloading of Gemma models from the Hugging Face web site. Use the Colab **Secrets** feature to securely save your token without adding it to your working code.\n",
+ "\n",
+ "To add your Hugging Face Access Token as a Secret:\n",
+ "\n",
+ "1. Open the secrets tab by selecting the key icon on left side of the interface, or select **Tools > Command pallete**, type `secrets`, and press **Enter**.\n",
+ "2. Select **Add new secret** to add a new secret entry.\n",
+ "3. In the **Name** field, enter `HF_TOKEN`.\n",
+ "4. In the **Value** field, enter the text of your Hugging Face Access Token.\n",
+ "5. In the **Notebook access** field, select the switch to enable access.\n",
+ "\n",
+ "Once you have entered your Access Token as `HF_TOKEN` and value, you can access and set it within your Colab notebook environment using the following code:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ogSB3peYP1b4"
+ },
+ "outputs": [],
+ "source": [
+ "from google.colab import userdata\n",
+ "from huggingface_hub import login\n",
+ "\n",
+ "# Login into Hugging Face Hub\n",
+ "hf_token = userdata.get('HF_TOKEN') # If you are running inside a Google Colab\n",
+ "login(hf_token)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2RgrZH4cP1b4"
+ },
+ "source": [
+ "### Install Python packages\n",
+ "\n",
+ "Install the Hugging Face libraries required for running the Gemma model and making requests."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ty1cuOpoP1b4"
+ },
+ "outputs": [],
+ "source": [
+ "# Install Pytorch & other libraries\n",
+ "%pip install \"torch>=2.4.0\"\n",
+ "\n",
+ "# Install a transformers version that supports Gemma 3 (>= 4.51.3)\n",
+ "%pip install \"transformers>=4.51.3\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "C_NRd79RP1b4"
+ },
+ "source": [
+ "## Generate text from text\n",
+ "\n",
+ "Prompting a Gemma model with text to get a text response is the simplest way to use Gemma and works with nearly all Gemma variants. This section shows how to use the Hugging Face Transformers library load and configure a Gemma model for text to text generation."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "NQ06l3iGP1b4"
+ },
+ "source": [
+ "### Load model\n",
+ "\n",
+ "Use the `torch` and `transformers` libraries to create an instance of a model execution `pipeline` class with Gemma. When using a model for generating output or following directions, select an instruction tuned (IT) model, which typically has `it` in the model ID string. Using the `pipeline` object, you specify the Gemma variant you want to use, the type of task you want to perform, specifically `\"text-generation\"` for text-to-text generation, as shown in the following code example:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "JoZ2ilfiP1b4"
+ },
+ "outputs": [],
+ "source": [
+ "import torch\n",
+ "from transformers import pipeline\n",
+ "\n",
+ "pipeline = pipeline(\n",
+ " task=\"text-generation\",\n",
+ " model=\"google/gemma-3-4b-it\",\n",
+ " device=0, # \"cuda\" for Colab, \"msu\" for iOS devices\n",
+ " torch_dtype=torch.bfloat16\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "eqtJZPHPP1b5"
+ },
+ "source": [
+ "Gemma supports only a few `task` settings for generation. For more information on the available `task` settings, see the Hugging Face Pipelines [task()](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.pipeline.task) documentation. Use the torch data type `torch.bfloat16` to reduce the precision of the model and compute resources needed, without significantly impacting the output quality of the model. For the `device` setting, you can use `\"cuda\"` for Colab, or `\"msu\"` for iOS devices, or just set this to `0` (zero) to specify the first GPU on your system. For more information about using the Pipeline class, see the Hugging Face [Pipelines](https://huggingface.co/docs/transformers/main/en/main_classes/) documentation."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "W8pRmlIqHvh2"
+ },
+ "source": [
+ "### Run text generation\n",
+ "\n",
+ "Once you have the Gemma model loaded and configured in a `pipeline` object, you can send prompts to the model. The following example code shows a basic request using the `text_inputs` parameter:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "MFXVitgPHvh2"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[{'generated_text': 'roses are red, violets are blue, \\ni love you more than you ever knew.\\n\\n**Explanation'}]"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pipeline(text_inputs=\"roses are red\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Giv5MIymHvh2"
+ },
+ "source": [
+ "### Use a prompt template\n",
+ "\n",
+ "When generating content with more complex prompting, use a prompt template to structure. A prompt template allows you to specify input from specific roles, such as `user` or `model`, and is a required format for managing multi-turn chat interactions with Gemma models. The following example code shows how to constuct a prompt template for Gemma:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "FijxO7MsHvh2"
+ },
+ "outputs": [],
+ "source": [
+ "messages = [\n",
+ " [\n",
+ " {\n",
+ " \"role\": \"system\",\n",
+ " \"content\": [{\"type\": \"text\", \"text\": \"You are a helpful assistant.\"},]\n",
+ " },\n",
+ " {\n",
+ " \"role\": \"user\",\n",
+ " \"content\": [{\"type\": \"text\", \"text\": \"Roses are red...\"},]\n",
+ " },\n",
+ " ],\n",
+ "]\n",
+ "\n",
+ "pipeline(messages, max_new_tokens=50)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "lKkJwXWkHvh2"
+ },
+ "source": [
+ "## Generate text from image data\n",
+ "\n",
+ "Starting with Gemma 3, for model sizes 4B and higher, you can use image data as part of your prompt. This section shows how to use the Transformers library to load and configure a Gemma model to use image data and text input to generate text output."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6H7d78KqHvh3"
+ },
+ "source": [
+ "### Load model\n",
+ "\n",
+ "When loading a Gemma model for use with image data, you configure the Transformer `pipeline` instance specifically for use with images. In particular, you must select a pipeline configuration that can handle visual data by setting the `task` parameter to `\"image-text-to-text\"`, as shown in the following code example:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Cy6EKYRYHvh3"
+ },
+ "outputs": [],
+ "source": [
+ "import torch\n",
+ "from transformers import pipeline\n",
+ "\n",
+ "pipeline = pipeline(\n",
+ " task=\"image-text-to-text\", # required for image input\n",
+ " model=\"google/gemma-3-4b-it\",\n",
+ " device=0,\n",
+ " torch_dtype=torch.bfloat16\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "f1YLpv-THvh3"
+ },
+ "source": [
+ "### Run text generation\n",
+ "\n",
+ "Once you have the Gemma model configured to handle image input with a `pipeline` instance, you can send prompts with images to the model. Use the `` token to add the image to the text of your prompt. The following example code shows a basic request using the `pipeline` parameter:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "LYHcWCEOHvh3"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[{'input_text': ' What is shown in this image?',\n",
+ " 'generated_text': ' What is shown in this image?\\n\\nThis image showcases a traditional Indian Thali. A Thali is a platter that contains a variety'}]"
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pipeline(\n",
+ " \"https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg\",\n",
+ " text=\" What is shown in this image?\"\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VYP5Lr9SUVi5"
+ },
+ "source": [
+ "### Use a prompt template\n",
+ "\n",
+ "When generating content with more complex prompting, use a prompt template to structure. A prompt template allows you to specify input from specific roles, such as `user` or `model`, and is a required format for managing multi-turn chat interactions with Gemma models. The following example code shows how to constuct a prompt template for Gemma:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "yVaqQ1IGU-FD"
+ },
+ "outputs": [],
+ "source": [
+ "messages = [\n",
+ " {\n",
+ " \"role\": \"user\",\n",
+ " \"content\": [\n",
+ " {\"type\": \"image\", \"url\": \"https://ai.google.dev/static/gemma/docs/images/thali-indian-plate.jpg\"},\n",
+ " {\"type\": \"text\", \"text\": \"What is shown in this image?\"},\n",
+ " ]\n",
+ " },\n",
+ " {\n",
+ " \"role\": \"assistant\",\n",
+ " \"content\": [\n",
+ " {\"type\": \"text\", \"text\": \"This image shows\"},\n",
+ " ],\n",
+ " },\n",
+ "]\n",
+ "\n",
+ "pipeline(text=messages, max_new_tokens=50, return_full_text=False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "wMgXjj0XRHdQ"
+ },
+ "source": [
+ "You can include multiple images in your prompt by including additional `\"type\": \"image\",` entries in the `content` list.\n",
+ "\n",
+ "Note: Do not use `` tokens in text portion of a prompt template as this approach creates redundant tokens and processing errors."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6f8ff452"
+ },
+ "source": [
+ "## Next steps\n",
+ "\n",
+ "Build and explore more with Gemma models:\n",
+ "\n",
+ "* [Fine-tune Gemma for text tasks using Hugging Face Transformers](https://ai.google.dev/gemma/docs/core/huggingface_text_finetune_qlora)\n",
+ "* [Fine-tune Gemma for vision tasks using Hugging Face Transformers](https://ai.google.dev/gemma/docs/core/huggingface_vision_finetune_qlora)\n",
+ "* [Perform distributed fine-tuning and inference on Gemma models](https://ai.google.dev/gemma/docs/core/distributed_tuning)\n",
+ "* [Use Gemma open models with Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/open-models/use-gemma)\n",
+ "* [Fine-tune Gemma using Keras and deploy to Vertex AI](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gemma_kerasnlp_to_vertexai.ipynb)"
+ ]
+ }
+ ],
+ "metadata": {
+ "accelerator": "GPU",
+ "colab": {
+ "name": "huggingface_inference.ipynb",
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}