examples : predicted output for text generation #14739
Open
+435
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds an example that allows the user to specify predicted (expected) outputs that are then used as a draft to speed up generation. The most prominent use case for this would be for something like making changes to an existing code block. Currently OpenAI has a similar feature with Predicted Outputs.
Obviously this has a lot of overlap with speculative decoding and lookup decoding. I think the main difference is that this gives the user more direct control over the expected output. This will also try to pick the draft up again if there is a difference followed by a few consecutive token matches. So in that sense, it brings in some of the benefits of lookup decoding.
I added in some example scripts for testing code modification. These can compare predicted vs speculative and lookup. I also included a script that uses the Osmosis-Apply-1.7B model that is directly targeted towards code patching.