Replies: 6 comments 9 replies
-
I just discovered this project after coming back to LLMs after so long and was amazed by this project. If you haven't already tried it, I would recommend checking out LangChain4J https://github.com/langchain4j/langchain4j as they have demos of RAG in Java, but it does require Ollama. This llama3.java implementation doesn't appear to use Ollama? which is very impressive. Having RAG would be amazing, but understandable if it's something further down the road in development. |
Beta Was this translation helpful? Give feedback.
-
I implemented low rent RAGLite with Jsoup; detect URL typed in cmdl with Xpath directives at the end, grab the content with Jsoup, do the Xpath, replace the cmdl, and wha la! Also integrated my DB to store dialog and reload context from cmdl: https://github.com/neocoretechs/Llama4j |
Beta Was this translation helpful? Give feedback.
-
Why does the llama model use a gpt2 tokenizer, but the gemma model use a llama tokenizer? P.S. I also notice the code seems like it supports FP16 and BF16 but only the Q4 and Q8 quantizations are recommended? Is this the case and the case with the new version? |
Beta Was this translation helpful? Give feedback.
-
There are two tokenizers flavors, I can already run the latest versions of Llama (Meta), Mistral, Gemma (Google), Qwen (Alibaba), Phi (Microsoft), Granite (IBM), SmoLM (HugginggFace), the DeepSeek R1 distills ... I added support for F16 and BF16 just to compare against the quantizations. My original goal was to implement and release all the building blocks needed to consume/run local models using Java e.g. a fast inference engine, GGUF/Safetensors model format parsers, tokenizers, a tiny tensor library ... but I only managed to complete and release the GGUF library. I also wrote a fast tokenizer library (faster than OpenAI's TikToken) but couldn't release it ... I got in trouble with my employer for this, so I've completely stopped working on it. I'll keep my weekend projects private for now. Here's a rough draft of the original project goals: Here's an old implementation of Mistral.java with the |
Beta Was this translation helpful? Give feedback.
-
@mukel @neocoretechs regarding GPU support for LLAMA3.java. We plan to release as soon as next week a repo based on Llama3.java that uses TornadoVM to offload the whole transormer architecture for inference on GPUs through Java. For v1.0 all Llama3 models that @mukel provided with be supported for Q4 and Q8. Initial support will inlcude optimized OpenCL and PTX support for Nvidia GPUs, and OpenCL support for Apple M-silicon, |
Beta Was this translation helpful? Give feedback.
-
We are excited to share https://github.com/beehive-lab/GPULlama3.java that builds on @mukel Llama3.java and enables GPUs acceleration through TornadoVM. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Any Idea how I can start implementing a RAG keeping this as a base ? Full java from scratch using some Vector DB
Beta Was this translation helpful? Give feedback.
All reactions