Skip to content

elphinkuo/fgml

Repository files navigation

fmgl

Acceleration library for Machine Learning, especially for large language models.

  • Uniform quantization of LLama2 model, without block grouping.
  • Uniform quantization of Llama2 model, support 64 * 64 block grouping.
  • Non Uniform Dense and Sparse quantization of LLAMA2 (3bit, 4bit), based on the Hessian information.
  • Inference Dense & Sparse 3bit, 4bit LLAMA2-7B.

About

Acceleration library for Machine Learning, especially for large language models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published