Skip to content

Commit 34e2be1

Browse files
committed
Merge branch 'main' of github.com:beehive-lab/GPULlama3.java
2 parents 833e9ea + cd3fab0 commit 34e2be1

File tree

1 file changed

+20
-2
lines changed

1 file changed

+20
-2
lines changed

README.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,14 +47,21 @@ This table shows inference performance across different hardware and quantizatio
4747
|:----------------------------:|:------------:|:---------------------:|:---------------------:|:---------------------:|:-------------:|
4848
| | | **Q8_0** | **Q4_0** | **Q4_0** | **Support** |
4949
| **NVIDIA / OpenCL-PTX** | RTX 3070 | 52 tokens/s | 50.56 tokens/s | 22.96 tokens/s ||
50-
| | RTX 4090 | 66.07 tokens/s | 65.81 tokens/s | 35.51 tokens/s ||
50+
| | RTX 4090 | 66.07 tokens/s | 65.81 tokens/s | 35.51 tokens/s ||
5151
| | RTX 5090 | 96.65 tokens/s | 94.71 tokens/s | 47.68 tokens/s ||
52-
| | H100 | XXXX tokens/s | XXXX tokens/s | XXXX tokens/s ||
52+
| | L4 Tensor | 52.96 tokens/s | 52.92 tokens/s | 22.68 tokens/s ||
5353
| **Intel / OpenCL** | Arc A770 | 15.65 tokens/s | 15.09 tokens/s | 7.02 tokens/s | (WIP) |
5454
| **Apple Silicon / OpenCL** | M3 Pro | 14.04 tokens/s | 13.83 tokens/s | 6.78 tokens/s | (WIP) |
5555
| | M4 Pro | 16.77 tokens/s | 16.67 tokens/s | 8.56 tokens/s | (WIP) |
5656
| **AMD / OpenCL** | Radeon RX | (WIP) | (WIP) | (WIP) | (WIP) |
5757

58+
##### ⚠️ Note on Apple Silicon Performance
59+
60+
TornadoVM currently runs on Apple Silicon via [OpenCL](https://developer.apple.com/opencl/), which has been officially deprecated since macOS 10.14.
61+
62+
Despite being deprecated, OpenCL can still run on Apple Silicon; albeit, with older drivers which do not support all optimizations of TornadoVM. Therefore, the performance is not optimal since TornadoVM does not have a Metal backend yet (it currently has OpenCL, PTX, and SPIR-V backends). We recommend using Mac’s for development and testing performance on OpenCL/PTX compatible Nvidia GPUs for the time being (until we add a Metal backend to TornadoVM and start optimizing it).
63+
64+
5865
-----------
5966

6067
## Setup & Configuration
@@ -388,6 +395,17 @@ Click [here](https://github.com/beehive-lab/GPULlama3.java/tree/main/docs/GPULla
388395

389396
-----------
390397

398+
## Acknowledgments
399+
400+
This work is partially funded by the following EU & UKRI grants (most recent first):
401+
402+
- EU Horizon Europe & UKRI [AERO 101092850](https://aero-project.eu/).
403+
- EU Horizon Europe & UKRI [P2CODE 101093069](https://p2code-project.eu/).
404+
- EU Horizon Europe & UKRI [ENCRYPT 101070670](https://encrypt-project.eu).
405+
- EU Horizon Europe & UKRI [TANGO 101070052](https://tango-project.eu).
406+
407+
-----------
408+
391409
## License
392410

393411
MIT

0 commit comments

Comments
 (0)