You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Integration of <strong>Llama3 models</strong> with <strong>TornadoVM</strong> to enable accelerated inference on Java using GPUs and CPUs. This project allows you to run Llama3 inference efficiently, leveraging TornadoVM's parallel computing features for enhanced performance.
<strong>Llama3</strong> models written in <strong>native Java</strong> automatically accelerated on GPUs with <strong>TornadoVM</strong>.
22
+
This project allows you to run Llama3 inference efficiently, leveraging TornadoVM's parallel computing features for enhanced performance.
23
+
22
24
<br><br>
23
-
This project builds on <ahref="https://github.com/mukel/llama3.java">Llama3.java</a>, based on the original <ahref="https://github.com/meta-llama/llama3">Llama 3</a>, <ahref="https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1">3.1</a>, and <ahref="https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/">3.2</a> models, with TornadoVM support for parallelism and hardware acceleration.
25
+
Builds on <ahref="https://github.com/mukel/llama3.java">Llama3.java</a>, based on the original <ahref="https://github.com/meta-llama/llama3">Llama 3</a>, <ahref="https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1">3.1</a>, and <ahref="https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/">3.2</a> models, with TornadoVM support for parallelism and hardware acceleration.
24
26
<br><br>
25
27
Thanks to <ahref="https://github.com/mukel">Alfonso² Peterssen</a> for the original implementation of Llama3.java.
26
28
<br><br>
@@ -41,7 +43,15 @@ Previous intergration of TornadoVM and Llama2 it can be found in <a href="https:
41
43
42
44
### TornadoVM-Accelerated Inference Performance and Optimization Status
43
45
44
-
This table shows inference performance across different hardware and quantization options.
46
+
We are at the early stages of Java entering the AI world with features added to the JVM that enable faster execution such as GPU acceleration, Vector acceleration, high-performance access to off-heap memory and others.
47
+
<br><br>This repository provides the first Java-native implementation of Llama3 that automatically compiles and executes Java code on GPUs via TornadoVM.
48
+
The baseline numbers presented below provide a solid starting point for achieving more competitive performance compared to llama.cpp or native CUDA implementations.
49
+
[Our roadmap](https://github.com/beehive-lab/GPULlama3.java/blob/main/docs/GPULlama3_ROADMAP.md) provides the upcoming set of features that will dramatically improve the numbers below with the clear target being to achieve performance parity with the fastest implementations.
50
+
<br><br>
51
+
If you achieve additional performance data points (e.g. new hardware or platforms) please let us know to add them below.
52
+
<br><br>
53
+
In addition, if you are interested to learn more about the challenges of managed programming languages and GPU acceleration, you can read [our book](https://link.springer.com/book/10.1007/978-3-031-49559-5) or consult the [TornadoVM educational pages](https://www.tornadovm.org/resources).
0 commit comments