Skip to content

v0.1.0-beta

Latest
Compare
Choose a tag to compare
@mikepapadim mikepapadim released this 30 May 07:01
· 60 commits to main since this release
0c9a05a
  • Llama 3 model compatibility - Full support for Llama 3.0, 3.1, and 3.2 models
  • GGUF format support - Native handling of GGUF model files
  • Support for FP16 models for reduced memory usage and faster computation
  • GPU Acceleration on NVIDIA GPUs using both OpenCL and PTX backends
  • [Experimental] Support for Apple Silicon (M1/M2/M3) via OpenCL (subject to hardware/compiler limitations)
  • [Experimental] Initial support for Q8 and Q4 quantized models, using runtime dequantization to FP16