Llama.cpp product roadmap medium term? #1933
Replies: 3 comments 1 reply
-
*The Cognitive Revolution podcast -(Episode 36)[https://www.cognitiverevolution.ai/e36-your-model-your-weights-with-mosaicmls-abhi-venigalla-and-jonathan-frankle/]] with MosaicML's Abhi Venigalla and Jonathan Frankle |
Beta Was this translation helpful? Give feedback.
-
I have an idea of trying to make an open-source Github Copilot alternative running locally using the knowledge we accumulated from |
Beta Was this translation helpful? Give feedback.
-
#3 Further, I'm not convinced that keyword/BM25 search vector databases (that seem to be the norm right now) will be the best solution in the medium and long term. I'm tending towards "semantic summary pyramids", with indexes & keywords. TL;DR - we need to be able to search primarily on meaning, not just keywords Vector databases are cheap & easy (ish)... but IMO we'll need LLMs to summarise and store data into large semantic databases. Another use case/business case for ggml tech IMO 😀 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Very curious and excited to see how how llama.cpp unfolds medium term!!
I love the code-centric DNA of the project. I love "Inference at the edge"... with an ethos of 'simplicity and efficiency.. performance is essential".
I get that the "goal is to prototype and not waste time in polishing products"... "have fun in the process". I'm curious as to how things will unfold in the medium term - epspecially given the funding and formation of ggml.ai. Having fun should be eternal... but at some stage the team would likely need to look up from the code and chart a course in one direction or another. So many options!
We only have a handfull of LLM "families" at the moment. In a year that could be 50... will llama.cpp & GGML.ai be aiming to support them all?
The guys at MosaicML* are finding that 80-90% of their corporate client demand is for extraction and summarization... with the balance being chat and other (eg Replit's CodeAnything). So should we expect to some GGML magic happen around extraction and summarization? Support for larger context windows, faster tokenisation, different tokenisation methods, extraction storage vector or semantic storage formats or databases?
Exciting times!!!
Beta Was this translation helpful? Give feedback.
All reactions