-
Notifications
You must be signed in to change notification settings - Fork 475
Description
Description of the bug:
Summary:
The Gemini API (accessed via Google AI Studio paid tier) is exhibiting non-deterministic behavior for the gemini-2.5-pro model. It is producing different outputs for identical requests, even when a fixed seed is provided along with a constant temperature. This behavior has been reliably reproduced and violates the API's core contract for deterministic generation, making it unreliable for production use.
Steps to Reproduce:
- API Call: Make an API call using the Gemini API.
- Model:
gemini-2.5-pro - Generation Config:
temperature: 0.1thinking_budget: 256seed: 42response_mime_type: "application/json"response_schema: list[str]
- Contents:
- Prompt: The full prompt text is provided below.
- Image: The image file is attached as
IMG_701015.JPG.
- First Execution: Execute the API call. The request successfully returns the expected, accurate JSON output (
[]). - Second Execution: Execute the exact same API call again with no changes.
Observed Result:
The second execution produces a different, incorrect JSON output (["11"]).
Expected Result:
The output of the first and second executions must be absolutely identical. The seed parameter must ensure a fully deterministic and repeatable outcome. The correct output for this specific image and prompt is [].
Full Prompt Text:
You are a hyper-precise visual analysis system with a single function: to return a JSON array of motorcycle racing numbers that meet a strict, non-negotiable standard of quality.
To ensure 100% accuracy, you must follow a new, two-stage protocol. This protocol is absolute.
INTERNAL PROTOCOL (DO NOT OUTPUT)
STAGE 1: FORENSIC QUALITY VERDICT (Prerequisite Stage)
This is your first and most important task. For every potential number candidate on a validly oriented motorcycle, you must render a binary verdict.
- Isolate the Candidate Area: Look ONLY at the front number plate area.
- Ask the Critical Question: "Is there a numerical figure in this area that is perfectly sharp, with clear, unambiguous edges, free of significant motion blur or compression artifacts?"
- Render the Verdict: Based on the question above, your internal verdict for the candidate MUST be one of two options:
VERDICT: PASS(The number is of forensic quality, 100% readable without guessing).VERDICT: FAIL(The number is blurry, indistinct, artifacted, or in any way ambiguous. Any doubt whatsoever means it is a FAIL).
This stage is absolute. If the verdict for a candidate is FAIL, it is immediately and permanently rejected. You will not proceed to Stage 2 for that candidate.
STAGE 2: DIGIT EXTRACTION (Conditional Stage)
You will only ever perform this stage if a candidate received a VERDICT: PASS in Stage 1.
- Extract Digits: For the candidate that passed, identify and record the digits.
- Final Check: Ensure the extracted digits are consistent with the high-quality image that was approved.
FINAL OUTPUT REQUIREMENT
Your entire output must be a single, valid JSON array of strings. It will contain ONLY the numbers from candidates that received a VERDICT: PASS in Stage 1 and were successfully extracted in Stage 2. If no candidates pass Stage 1, return an empty array []. Do not include any explanatory text, markdown, or any characters outside of the final JSON object.
Actual vs expected behavior:
No response
Any other information you'd like to share?
No response