Release v0.1.0-beta · beehive-lab/GPULlama3.java

Llama 3 model compatibility - Full support for Llama 3.0, 3.1, and 3.2 models
GGUF format support - Native handling of GGUF model files
Support for FP16 models for reduced memory usage and faster computation
GPU Acceleration on NVIDIA GPUs using both OpenCL and PTX backends
[Experimental] Support for Apple Silicon (M1/M2/M3) via OpenCL (subject to hardware/compiler limitations)
[Experimental] Initial support for Q8 and Q4 quantized models, using runtime dequantization to FP16

Provide feedback