-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Open
Labels
performanceSpeed related topicsSpeed related topicsrefactoringRefactoringRefactoringroadmapPart of a roadmap projectPart of a roadmap project
Description
There have been a few reports where the grammar sampling can significantly degrade the performance.
It would be nice to profile and optimize the implementation - there should be room for improvements.
Already on-going efforts:
reserve
space indecode_utf8
#4210- Allow reusing results from
llama_token_to_piece
when sampling grammars #4213
Probably worth looking in multi-threading the implementation as well.
lin72h, ExtReMLapin, NXTler, Philipp-Sc, asmaier and 15 more
Metadata
Metadata
Assignees
Labels
performanceSpeed related topicsSpeed related topicsrefactoringRefactoringRefactoringroadmapPart of a roadmap projectPart of a roadmap project