fmgl

Acceleration library for Machine Learning, especially for large language models.

Uniform quantization of LLama2 model, without block grouping.
Uniform quantization of Llama2 model, support 64 * 64 block grouping.
Non Uniform Dense and Sparse quantization of LLAMA2 (3bit, 4bit), based on the Hessian information.
Inference Dense & Sparse 3bit, 4bit LLAMA2-7B.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
HessianQuantization_DenseSparse @ 749c59e		HessianQuantization_DenseSparse @ 749c59e
autocompress		autocompress
llamaqt @ 9c447bd		llamaqt @ 9c447bd
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback