-
Notifications
You must be signed in to change notification settings - Fork 341
Description
The documentation for Gemma 3 (https://huggingface.co/google/gemma-3-1b-it for example) use a prompt structure that seems to be at odds with Google's Gemma documentation.
Here is the documentation on Huggingface:
model_id = "google/gemma-3-1b-it"
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = Gemma3ForCausalLM.from_pretrained(
model_id, quantization_config=quantization_config
).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
[
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."},]
},
{
"role": "user",
"content": [{"type": "text", "text": "Write a poem on Hugging Face, the company"},]
},
],
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device).to(torch.bfloat16)
However, from Google's documentation (https://ai.google.dev/gemma/docs/core/prompt-structure):
"""
System instructions
Gemma's instruction-tuned models are designed to work with only two roles: user and model. Therefore, the system role or a system turn is not supported.
Instead of using a separate system role, provide system-level instructions directly within the initial user prompt. The model instruction following capabilities allow Gemma to interpret the instructions effectively. For example:
<start_of_turn>user
Only reply like a pirate.
What is the answer to life the universe and everything?<end_of_turn>
<start_of_turn>model
Arrr, 'tis 42,<end_of_turn>
"""
It appears Gemma model's, at least from Google's documentation (as well as Phil Schmid's blog from DeepMind https://www.philschmid.de/gemma-function-calling), does not use an actual structured system prompt. The system instructions for the instruction tuned models seem to be placed at the beginning of the first user message, in fact the documentation states it doesn't even support at system
role like the HuggingFace documentation is using here.