Streaming-LLM? #1348
regularfry
started this conversation in
Ideas
Streaming-LLM?
#1348
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
https://github.com/mit-han-lab/streaming-llm is a neat-looking approach for extending the apparent context length of an LLM, but the way it does it made me think that it might be a better approach to streaming whisper than what's currently in
examples/stream
. It's a moderately invasive update to the model itself, but it looks fairly formulaic.Has anyone got any intuition (or, better, any code) that might inform whether a) it could work for whisper.cpp; and b) might reduce latency overall?
Beta Was this translation helpful? Give feedback.
All reactions