Skip to content

[Feature]: Add support for Llamafile provider #3225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rupurt opened this issue Apr 22, 2024 · 6 comments · May be fixed by #10203
Open

[Feature]: Add support for Llamafile provider #3225

rupurt opened this issue Apr 22, 2024 · 6 comments · May be fixed by #10203
Labels
enhancement New feature or request

Comments

@rupurt
Copy link

rupurt commented Apr 22, 2024

The Feature

Support connecting to Llamafile models https://github.com/Mozilla-Ocho/llamafile

Motivation, pitch

It's a self contained format to run models as a single binary

Twitter / LinkedIn details

https://twitter.com/rupurt

@rupurt rupurt added the enhancement New feature or request label Apr 22, 2024
@krrishdholakia
Copy link
Contributor

@rupurt they're openai-compatible https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#json-api-quickstart

You can already call them with litellm - https://docs.litellm.ai/docs/providers/openai_compatible

Let me know if there's anything i'm missing.

@peteski22
Copy link

peteski22 commented Mar 18, 2025

The problem with just using openai as the provider is that means you're asking LiteLLM to use the OpenAI client from the OpenAI Python API library.

OpenAI client requires an API key either from the args (api_key) or falls back to the environment variable (OPENAI_API_KEY).

If neither are present (e.g. None) then the client errors.

Given that Llamafile doesn't need an API key, then I'd suggest the hosted_vllm provider route ('for OpenAI compatible server'). It seems preferable over having to ensure you send a fake API key in a request (hosted_vllm does it for you).

See: https://docs.litellm.ai/docs/providers/vllm

e.g. ./mistral-7b-instruct-v0.2.Q4_0.llamafile --server

from litellm import completion

msg = "What is the meaning of life?"
messages = [{"content": msg, "role": "user"}]

response = completion(
    model="hosted_vllm/Mistral-7B-Instruct-v0.2",
    base_url="http://127.0.0.1:8080/v1",
    messages=messages,
)

reply = response["choices"][0]["message"]["content"]

print("Completion Result:\n")
print(f"User: {msg}\n\nAssistant: {reply}\n{'-' * 40}")

@krrishdholakia
Copy link
Contributor

Hey @peteski22 that's fair - would you be able to contribute a PR for this?

@peteski22
Copy link

Hey @peteski22 that's fair - would you be able to contribute a PR for this?

Thanks @krrishdholakia, I'll take a look into it and see if I can raise a PR (it might not be immediate though) 😄.

@peteski22 peteski22 linked a pull request Apr 22, 2025 that will close this issue
4 tasks
@peteski22
Copy link

Hey @peteski22 that's fair - would you be able to contribute a PR for this?

Thanks @krrishdholakia, I'll take a look into it and see if I can raise a PR (it might not be immediate though) 😄.

Hey again @krrishdholakia, sorry for the delay. I've created #10203.

If you have any time to review it, I'd be glad to get your feedback and see if anything needs to be done to move it along/get it merged.

Thanks 😄

@krrishdholakia
Copy link
Contributor

Left comments there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants