-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
I'm interested in potentially replacing llama-cpp-python with easy-llama in my project, and have some questions about feature parity:
- Are all parameters in the
params
dict below available?
- Is it possible to get the logits after a certain input? As done here:
- Similar to 2. but more nuanced: is there a way to get the logits for every token position in an input at once? In llama-cpp-python, this is done by passing
logits_all=True
while loading the model, which reduces performance but makes all logits available as a matrix when you get them withmodel.eval_logits
. I have used this feature to measure the perplexity of llama.cpp quants a while ago using the code here:
- I have a llamacpp_HF wrapper that connects llama.cpp to HF text generation functions; at its core, all it does is update
model.n_tokens
to do prefix matching, and evaluate new tokens by callingmodel.eval
taking as input a list containing the new tokens only. Can that be done with easy-llama? See:
- Is speculative decoding implemented? There is a PR here https://github.com/oobabooga/text-generation-webui/pull/6669/files to add it, and having it in easy-llama would be great, especially if it could be done in a simple way by just passing new kwargs to its model loading and/or generation functions. I believe doing that for my llamacpp_HF wrapper would be very hard, so that's not something I have hopes for.
If you are interested, a PR changing llama-cpp-python to easy-llama in my repository would be highly welcome once wheels are available. It would be a way to test the library as well. But I can also to try to do the change myself.
Metadata
Metadata
Assignees
Labels
No labels