Skip to content

AMX matrix multiplication implementation of a 4-bit quantified version that is better than libxsmm when the batch_size exceeds 16.

Notifications You must be signed in to change notification settings

ZJUSCT/IPEX_CPU_W4A8_Linear

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

IPEX_CPU_W4A8_Linear

AMX matrix multiplication implementation of a 4-bit quantified version that is better than libxsmm when the batch_size exceeds 16.

I rewrote the AMX operation part of the 4-bit quantization code in the woq linear section, and the model used for testing is DeepSeek-R1-Distill-Qwen-32B.

I conducted performance testing and found a 20% improvement compared to the original code. The result chart is as follows.

image

In this repository, I only provided the location of the modified code. You just need to replace the original files and use the official script to compile it.

About

AMX matrix multiplication implementation of a 4-bit quantified version that is better than libxsmm when the batch_size exceeds 16.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages