What impact does the JSON decoding strategy potentially have on constraining/influencing the component response contents #8008
Replies: 1 comment
-
I have been thinking about this and am experimenting with a multistep approach. First step is to have it create metadata for the document along with a refinement (don't ask for a summary or a structure, just ask it to condense to the primary actionable parts and leave a reference to the complete document in the metadata). Then if needed run a pass to group documents together using embedding and similarity or just a numbered metric of some kind. Then ingest again with the goal of the structured json outputs. I don't think it is the grammar necessarily that is constraining its ability, I think it is that you are asking it to pay attention to too many specific points at once. By stepping it through finer and finer structures you may be able to see where it is having problems and work around that. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I apologize if this question is not quite clear. I'm making extensive use of the JSON and GBNF decoding strategies to ensure that outputs are always generated in valid JSON format. For the most part this works great, however I've noticed that sometimes, especially when the requested response object (i.e. the JSON schema) is complex, the quality of component elements like a summary can be degraded.
I'll typically build a request as a combination of the free-text prompt that describes the information I want to extract or generate, as well as a brief description of the components, then supply a corresponding JSON schema that reflects that. For example something like:
then we will fill in the "prompt" with a text description that supports the schema, e.g.:
What I am noticing is that elements like the Summary can suffer degraded quality when the overall schema is too large or complex. This made me wonder if the strategy itself might be playing a part in this, as opposed to just the complexity of the prompts or the size of the interactions.
I can imagine that forcing the search through a more complex overall grammar causes it to move through lower likelihood states than it would when responding to a single, standalone request with fewer constraints, "summary, between 500-800 tokens". If this is true and I understand this correctly (a big if!) I wonder if there might be an alternative approach that is better in some cases, where we cache the constant context, in this case the 'conversation', and instead of generating a complete response for the grammar as a whole, iteratively answer each element "as we parse".
Just to be clear, this is not a question about 'correctness', as I don't see any issues with compliance to provided grammars or schemas; it's only about potential impact on overall quality and the influence the strategy (might?) have on it.
Any pointers to help me better understand what I might be missing here, or thoughts on this in general would be welcome.
Beta Was this translation helpful? Give feedback.
All reactions