Skip to content

Python - Pre-compiled CFFI module for CPU and CUDA #8633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mtasic85
Copy link
Contributor

Pre-compiled Python binding for llama.cpp using cffi. Supports CPU and CUDA 12.5 execution. Binary distribution which does not require complex compilation steps.

Installation is as simple and fast as pip install llama-cpp-cffi .

@slaren
Copy link
Member

slaren commented Jul 22, 2024

Considering that does not expose the llama.cpp API at all, and only provides a very light interface to the main example, I wouldn't call this bindings. Normally we have been very lax on the quality requirements of the linked projects, but putting this on the same level as the excellent python bindings provided by @abetlen does not seem right me.

@mtasic85
Copy link
Contributor Author

I believe that beauty and elegance of of main/llama-cli example is reason why most of people who use llama.cpp are here. One command line invocation in terminal and you get what you need.

My goal is not to cover every function call from ggml/llama.cpp but just enough to replicate the same experience of main/llama-cli for the developers. If a developer can recognize and use any command argument, and then know that with exactly same function parameter/argument they can call into python library and get the same result with minimal friction, then we achieved our goal.

Another problem is requiring build toolchains just to install Python library is already an huge issue. I believe that most of people stick (or at least begin with) to CPU-only version because setting up build environment is just not easy. I enjoy doing that, but I am not sure about others.

My next goals are going to be support of more pre-compiled backends for more operating systems and platforms. Except that, goal is not to increase complexity of library by wrapping more function calls and then manually replicating already great main/llama-cli example.

I hope these arguments are enough for considering llama-cpp-cffi as another Python option.

@ngxson
Copy link
Collaborator

ngxson commented Jul 24, 2024

Please @mtasic85 , listen to collaborators and project owner instead of spamming with new PRs.

This PR and #8339 (and also tangledgroup/llama-cpp-wasm) are hacky solutions and they should not exist at all. We never want llama-cli to be final product, but rather a way to showcase for developers how to use APIs provided by llama.cpp

If a developer can recognize and use any command argument, and then know that with exactly same function parameter/argument they can call into python library and get the same result with minimal friction, then we achieved our goal.

I don't get the point. Then, what's wrong with llama-cpp-python ? Just less than 10 lines of code:

from llama_cpp import Llama

llm = Llama(
      model_path="./models/7B/llama-model.gguf",
)
output = llm(
      "Q: Name the planets in the solar system? A: ", # Prompt
      max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
      stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
      echo=True # Echo the prompt back in the output
) # Generate a completion, can also call create_completion
print(output)

Another problem is requiring build toolchains just to install Python library is already an huge issue. I believe that most of people stick (or at least begin with) to CPU-only version because setting up build environment is just not easy. I enjoy doing that, but I am not sure about others.

FYI, if you browse through some issues, most people use GPU, not CPU-only like your assumption.

I hope these arguments are enough for considering llama-cpp-cffi as another Python option.

If you want to make good contributions on behalf of Tangled Group, please considering looking at llama.cpp's roadmap. There are many more important things worth taking time.

@mtasic85
Copy link
Contributor Author

Please @mtasic85 , listen to collaborators and project owner instead of spamming with new PRs.

Hey @ngxson we are just exploring ways how to use llama.cpp, we do not need to agree on everything. I respect different opinions and suggestions. I don't agree that I spammed anyone. We are exchanging arguments, that is all.

This PR and #8339 (and also tangledgroup/llama-cpp-wasm) are hacky solutions and they should not exist at all. We never want llama-cli to be final product, but rather a way to showcase for developers how to use APIs provided by llama.cpp

I agree that it was hacky solution, but then you got inspired by that project and created even better https://github.com/ngxson/wllama .

I don't get the point. Then, what's wrong with llama-cpp-python ? Just less than 10 lines of code:

Nothing is wrong, this is just different approach. llama-cpp-python is well established project, but we do not use it in production. What we use in production for about an year is compiled llama.cpp (main/llama-cli) running as subprocess controlled by python main process. For our scenarios it has been great. After any issue with llama.cpp or model, we just kill subprocess, release memory, restart process and keep going.

FYI, if you browse through some issues, most people use GPU, not CPU-only like your assumption.

GPUs are favored, but from convince perspective, I see way more CPU usage around my circles. Anyway, this would be valuable statistics to know.

If you want to make good contributions on behalf of Tangled Group, please considering looking at llama.cpp's roadmap. There are many more important things worth taking time.

I check it often, but I really need to test llama.cpp in production environments. I like the project so much and evolution of it, so I can only commit to testing it.

@ngxson
Copy link
Collaborator

ngxson commented Jul 24, 2024

I think there are just too many red flags for me to call this a spam. Let me walk through one by one:

  1. Why the project must be under https://github.com/tangledgroup and not under your own account?
  2. Why on build example/main.cpp as shared library and intercept token printing using FFI #8339 you proposed exactly the same thing. @ggerganov already said that this is not an intended usage of llama-cli. Why you still open open a new PR with exactly the same idea?
  3. I had to create my own wasm binding because your tangledgroup/llama-cpp-wasm is both hacky and unmaintained. Why don't you spend time to maintain it? What will prevent you from stopping to maintain llama-cpp-cffi in the next 3 months? This is basically a trust issue for me.

Then, to address your arguments about production-ready stuffs:

I agree that it was hacky solution

What we use in production for about an year is compiled llama.cpp (main/llama-cli) running as subprocess controlled by python main process.

Firstly, llama-cli is not production-ready. It is literally an example, so your usage is not expected.

Why did you agree that this is a hacky solution but then proceed to use it in production?

Your usage of llama-cli is extremely inefficient, because each time you run llama-cli, the model need to be reloaded, memory is reallocated, etc. In the other hand, llama-cpp-python is already prod-ready, well-maintained and much more efficient than your hack.

I check it often, but I really need to test llama.cpp in production environments.

Again: llama-cpp-python is prod-ready. Why don't you just use that?

so I can only commit to testing it.

Why? You can literally compile the project or to try using llama-cpp-python on your computer to play around and build on top of that. Why you "can only commit to testing it?"

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants