Skip to content

Critical Determinism Failure in Gemini API (gemini-2.5-pro) with Fixed seed and temperature and thinking budget #745

@shuknk8s

Description

@shuknk8s

Description of the bug:

Summary:
The Gemini API (accessed via Google AI Studio paid tier) is exhibiting non-deterministic behavior for the gemini-2.5-pro model. It is producing different outputs for identical requests, even when a fixed seed is provided along with a constant temperature. This behavior has been reliably reproduced and violates the API's core contract for deterministic generation, making it unreliable for production use.

Steps to Reproduce:

  1. API Call: Make an API call using the Gemini API.
  2. Model: gemini-2.5-pro
  3. Generation Config:
    • temperature: 0.1
    • thinking_budget: 256
    • seed: 42
    • response_mime_type: "application/json"
    • response_schema: list[str]
  4. Contents:
    • Prompt: The full prompt text is provided below.
    • Image: The image file is attached as IMG_701015.JPG.
  5. First Execution: Execute the API call. The request successfully returns the expected, accurate JSON output ([]).
  6. Second Execution: Execute the exact same API call again with no changes.

Observed Result:
The second execution produces a different, incorrect JSON output (["11"]).

Expected Result:
The output of the first and second executions must be absolutely identical. The seed parameter must ensure a fully deterministic and repeatable outcome. The correct output for this specific image and prompt is [].

Full Prompt Text:
You are a hyper-precise visual analysis system with a single function: to return a JSON array of motorcycle racing numbers that meet a strict, non-negotiable standard of quality.

To ensure 100% accuracy, you must follow a new, two-stage protocol. This protocol is absolute.

INTERNAL PROTOCOL (DO NOT OUTPUT)


STAGE 1: FORENSIC QUALITY VERDICT (Prerequisite Stage)

This is your first and most important task. For every potential number candidate on a validly oriented motorcycle, you must render a binary verdict.

  1. Isolate the Candidate Area: Look ONLY at the front number plate area.
  2. Ask the Critical Question: "Is there a numerical figure in this area that is perfectly sharp, with clear, unambiguous edges, free of significant motion blur or compression artifacts?"
  3. Render the Verdict: Based on the question above, your internal verdict for the candidate MUST be one of two options:
    • VERDICT: PASS (The number is of forensic quality, 100% readable without guessing).
    • VERDICT: FAIL (The number is blurry, indistinct, artifacted, or in any way ambiguous. Any doubt whatsoever means it is a FAIL).

This stage is absolute. If the verdict for a candidate is FAIL, it is immediately and permanently rejected. You will not proceed to Stage 2 for that candidate.


STAGE 2: DIGIT EXTRACTION (Conditional Stage)

You will only ever perform this stage if a candidate received a VERDICT: PASS in Stage 1.

  1. Extract Digits: For the candidate that passed, identify and record the digits.
  2. Final Check: Ensure the extracted digits are consistent with the high-quality image that was approved.

FINAL OUTPUT REQUIREMENT

Your entire output must be a single, valid JSON array of strings. It will contain ONLY the numbers from candidates that received a VERDICT: PASS in Stage 1 and were successfully extracted in Stage 2. If no candidates pass Stage 1, return an empty array []. Do not include any explanatory text, markdown, or any characters outside of the final JSON object.

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions