Llama.cpp product roadmap medium term? #1933

ianscrivener · 2023-06-19T05:51:41Z

ianscrivener
Jun 19, 2023

Very curious and excited to see how how llama.cpp unfolds medium term!!

I love the code-centric DNA of the project. I love "Inference at the edge"... with an ethos of 'simplicity and efficiency.. performance is essential".

I get that the "goal is to prototype and not waste time in polishing products"... "have fun in the process". I'm curious as to how things will unfold in the medium term - epspecially given the funding and formation of ggml.ai. Having fun should be eternal... but at some stage the team would likely need to look up from the code and chart a course in one direction or another. So many options!

We only have a handfull of LLM "families" at the moment. In a year that could be 50... will llama.cpp & GGML.ai be aiming to support them all?

The guys at MosaicML* are finding that 80-90% of their corporate client demand is for extraction and summarization... with the balance being chat and other (eg Replit's CodeAnything). So should we expect to some GGML magic happen around extraction and summarization? Support for larger context windows, faster tokenisation, different tokenisation methods, extraction storage vector or semantic storage formats or databases?

Exciting times!!!

ianscrivener · 2023-06-19T06:57:58Z

ianscrivener
Jun 19, 2023
Author

*The Cognitive Revolution podcast -(Episode 36)[https://www.cognitiverevolution.ai/e36-your-model-your-weights-with-mosaicmls-abhi-venigalla-and-jonathan-frankle/]] with MosaicML's Abhi Venigalla and Jonathan Frankle

0 replies

ggerganov · 2023-06-20T16:04:29Z

ggerganov
Jun 20, 2023
Maintainer

I have an idea of trying to make an open-source Github Copilot alternative running locally using the knowledge we accumulated from llama.cpp. If you are interested, checkout the discussion over here: ggml-org/p1#1

1 reply

ianscrivener Jun 25, 2023
Author

@ggerganov according to the MosaicML guys, 'Github Copilot also rans' are just a few % of the overall demand in the LLM inference. I'm sure Jetbrains is working on theirs.

IMO ggml.ai is strategically well positioned to OWN the small/fast/edge DIY part of the LLM inference space. Competing with GituHub CoPilot, Google Duet etc does not seem to me to be a good way to invest effort

IMO (1) extraction and summarization and (2) session/context save/restore are the 2 most important features.. AND they are aligned with the the ggml.ai ethos/roadmap.

🪙🪙... my 2 cents worth

ianscrivener · 2023-06-25T10:58:13Z

ianscrivener
Jun 25, 2023
Author

#3 Further, I'm not convinced that keyword/BM25 search vector databases (that seem to be the norm right now) will be the best solution in the medium and long term. I'm tending towards "semantic summary pyramids", with indexes & keywords.

TL;DR - we need to be able to search primarily on meaning, not just keywords

Vector databases are cheap & easy (ish)... but IMO we'll need LLMs to summarise and store data into large semantic databases. Another use case/business case for ggml tech IMO 😀

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama.cpp product roadmap medium term? #1933

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Llama.cpp product roadmap medium term? #1933

Uh oh!

ianscrivener Jun 19, 2023

Replies: 3 comments · 1 reply

Uh oh!

ianscrivener Jun 19, 2023 Author

Uh oh!

ggerganov Jun 20, 2023 Maintainer

Uh oh!

ianscrivener Jun 25, 2023 Author

Uh oh!

ianscrivener Jun 25, 2023 Author

ianscrivener
Jun 19, 2023

Replies: 3 comments 1 reply

ianscrivener
Jun 19, 2023
Author

ggerganov
Jun 20, 2023
Maintainer

ianscrivener Jun 25, 2023
Author

ianscrivener
Jun 25, 2023
Author