Skip to content

Commit b255843

Browse files
authored
Update README.md
1 parent 51cc2ad commit b255843

File tree

1 file changed

+14
-4
lines changed

1 file changed

+14
-4
lines changed

README.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,12 @@
1717
<td style="width: 40%; vertical-align: middle; border: none;">
1818
<img src="docs/java-tornado-gpu.jpg" width="100%">
1919
</td>
20-
<td style="vertical-align: middle; padding-left: 20px; border: none;">
21-
Integration of <strong>Llama3 models</strong> with <strong>TornadoVM</strong> to enable accelerated inference on Java using GPUs and CPUs. This project allows you to run Llama3 inference efficiently, leveraging TornadoVM's parallel computing features for enhanced performance.
20+
<td style="vertical-align: middle; padding-left: 20px; border: none;">
21+
<strong>Llama3</strong> models written in <strong>native Java</strong> automatically accelerated on GPUs with <strong>TornadoVM</strong>.
22+
This project allows you to run Llama3 inference efficiently, leveraging TornadoVM's parallel computing features for enhanced performance.
23+
2224
<br><br>
23-
This project builds on <a href="https://github.com/mukel/llama3.java">Llama3.java</a>, based on the original <a href="https://github.com/meta-llama/llama3">Llama 3</a>, <a href="https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1">3.1</a>, and <a href="https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/">3.2</a> models, with TornadoVM support for parallelism and hardware acceleration.
25+
Builds on <a href="https://github.com/mukel/llama3.java">Llama3.java</a>, based on the original <a href="https://github.com/meta-llama/llama3">Llama 3</a>, <a href="https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1">3.1</a>, and <a href="https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/">3.2</a> models, with TornadoVM support for parallelism and hardware acceleration.
2426
<br><br>
2527
Thanks to <a href="https://github.com/mukel">Alfonso² Peterssen</a> for the original implementation of Llama3.java.
2628
<br><br>
@@ -41,7 +43,15 @@ Previous intergration of TornadoVM and Llama2 it can be found in <a href="https:
4143

4244
### TornadoVM-Accelerated Inference Performance and Optimization Status
4345

44-
This table shows inference performance across different hardware and quantization options.
46+
We are at the early stages of Java entering the AI world with features added to the JVM that enable faster execution such as GPU acceleration, Vector acceleration, high-performance access to off-heap memory and others.
47+
<br><br>This repository provides the first Java-native implementation of Llama3 that automatically compiles and executes Java code on GPUs via TornadoVM.
48+
The baseline numbers presented below provide a solid starting point for achieving more competitive performance compared to llama.cpp or native CUDA implementations.
49+
[Our roadmap](https://github.com/beehive-lab/GPULlama3.java/blob/main/docs/GPULlama3_ROADMAP.md) provides the upcoming set of features that will dramatically improve the numbers below with the clear target being to achieve performance parity with the fastest implementations.
50+
<br><br>
51+
If you achieve additional performance data points (e.g. new hardware or platforms) please let us know to add them below.
52+
<br><br>
53+
In addition, if you are interested to learn more about the challenges of managed programming languages and GPU acceleration, you can read [our book](https://link.springer.com/book/10.1007/978-3-031-49559-5) or consult the [TornadoVM educational pages](https://www.tornadovm.org/resources).
54+
4555

4656
| Vendor / Backend | Hardware | Llama-3.2-1B-Instruct | Llama-3.2-3B-Instruct | Optimizations |
4757
|:----------------------------:|:------------:|:---------------------:|:---------------------:|:-------------:|

0 commit comments

Comments
 (0)