Poor quality results with reasoning models and structured output #15670
matthew-at-qamcom
announced in
Q&A
Replies: 1 comment
-
Great news! The JSON component of structured output doesn't suffer the same issues. Here are some results from different approaches I tried. Guided decoding by Regex (like in the example)Paris: 58 Guided decoding by Regex (like in the example) but this time with temperature set to zeroParis: 48 Turning off the guided Regex and going back to a temperature of a oneParis: 100 (I'm just looking to see if "Paris" or "London" appears in the response.) Trying a one-shot prompt with guided regexI modified the prompt to be Paris: 6 Wow. Same one-shot prompt without guided regexParis: 100 Testing out guided JSON
Paris: 100 Much better :o) Guided JSON that's closer to the original problem
Paris: 100 So my solution will be to use guided JSON and not Regex. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using QwQ-32B [1] with structured outputs. I'm finding the results are of very low quality. Am I doing something wrong?
I have modified some of the example code that asks if the capital of France is either Paris or London.
With a temperate of 1, it picks London in 54 out of 100 attempts!
Changing the temperature to 0 has no real impact (44 out of 100 runs picked London).
This is less than ideal. Without structured output, I doubt QwQ would ever get the answer wrong.
Any suggestions?
Thanks in advance.
I'm running the model using:
My modifications to the example code:
[1] Specifically, I'm using ospatch/QwQ-32B-INT8-W8A8
Beta Was this translation helpful? Give feedback.
All reactions