Replies: 2 comments 4 replies
-
IMO I don't see any benefit of using FastCGI. This proposal sounds like a marketing to me. The frontend code that you mention is there for convenient. Our goal is to have one single binary that contains everything, so the html code is built into the cpp code.
In reality, frontend code is non-important part of llama-server. Have you ever look at the code of the other handlers, for example |
Beta Was this translation helpful? Give feedback.
-
Using FastCGI is not about marketing at all. FastCGI is a widely used, open protocol that allows separation between the app (llama-server in this case) and the web server. You can read more about it here: https://en.wikipedia.org/wiki/FastCGI The benefit of the FastCGI approach is that it keeps application logic clean and separates it from web-serving. Instead of bundling everything (HTML, logic, server handling) into a single binary, Moreover, httplib lacks the scalability and security required for deployment on public endpoints, so it's better to use reliable and secure web servers while letting llama-server focus on core API functions.
That is exactly why maintaining non-essential code in llama-server unnecessarily complicates things. This isn’t about adding complexity but about future-proofing the design and keeping the core logic focused on API.
Hardcoding filenames like "index.js" is certainly not a good design practice. It’s unclear how this is "convenient." A much clearer approach is to let users decide their front-end implementation while using FastCGI to interface with the API. Given the prominence of llama.cpp and its growing user base, adopting such a flexible approach is crucial to accommodate a wider audience (hard coding is definitely not convenient).
Yes, we're very familiar with the code. We've even implemented loading and unloading multiple models simultaneously for RAG, so we understand the setup well. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently, llma-server is implemented using httplib. While it functions well, we believe that transitioning to a FastCGI server would be a more effective solution than continuing with a web server implementation. This change would simplify the codebase, allowing llama.cpp to focus solely on the API implementation without needing to manage the UI (HTML code) or edge functions like TLS, access control, load balancing, security, etc. Additionally, it would enable users to configure their preferred web server and llma-server serves as the FastCGI server. That way llama.cpp does not need to worry about web security issues when hosted on the public endpoint.
This simplifies coding as well. For example, fixed mappings like the following will be removed, as llama.cpp would no longer need to handle frontend:
We have taken the same approach for the mesibo on-premise server, and it has proven successful.
Thank you for considering!
Beta Was this translation helpful? Give feedback.
All reactions