This repository was archived by the owner on Jun 24, 2024. It is now read-only.
Replies: 3 comments
-
You can't easily get this with the |
Beta Was this translation helpful? Give feedback.
0 replies
-
These two examples both demonstrate piping some inference stats to the terminal: inference.rs & vicuna-chat.rs |
Beta Was this translation helpful? Give feedback.
0 replies
-
Perfect, thanks! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! Thanks for building this awesome library, I'm trying to figure out how many tokens / s are generated by this so I can compare performance to other libraries like https://github.com/abetlen/llama-cpp-python. This gives you a debug output like the following:
Output generated in 266.13 seconds (1.50 tokens/s, 398 tokens, context 627)
Any way to get a similar output when running the repl command? Thanks!Beta Was this translation helpful? Give feedback.
All reactions