Replies: 5 comments 2 replies
-
To use AMX the weights need to be stored in an AMX buffer type. If you are experimenting I suggest making The more portable way to do this would be to use |
Beta Was this translation helpful? Give feedback.
-
Ok, I've made that change, namely, I changed the line
to
and in order to make ggml_backend_amx_buffer_type() "public" I added |
Beta Was this translation helpful? Give feedback.
-
Ok, I've taken the weight tensor and converted it to int8 and updated the tensor loading code accordingly. My input vector is still fp32. Now, when I call I tried using int8 for the input tensor as well but that causes an assertion failure in What would be the easiest way to debug this? |
Beta Was this translation helpful? Give feedback.
-
Alright, I've found why the segfault is occurring. It had nothing to do with allocating amx tensors as I reverted my code to the original which simply calls So to be clear, I am simply trying to matrix-multiply a 64x64 int8 ("weight") matrix with a 64x64 fp32 ("input") matrix for now. I debugged the segfault down to
and gdb says that |
Beta Was this translation helpful? Give feedback.
-
Alright, I think I figured everything out. I'll put my observations here in case this is useful to anyone later on. The first issue was this line:
This creates a total of 3 tensors - the output of
but the two intermediate tensors are just the standard CPU backend type which means that internally, the AMX matmul wasn't running since its weight tensor was not AMX type, instead the standard CPU backend matmul was running. The CPU matmul uses a function pointer I still ran into this error: Before I close this out, I just had a couple more questions. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to execute a CNN explicitly using Intel AMX as much as possible for some performance evaluation. I already have the entire CNN implemented in ggml and some model parameters trained with a pytorch program and exported to the ggml format, but all the weights and biases are FP32s.
I looked through the codebase and it looks like all AMX operations in ggml are in the
tinygemm_kernel_amx
function which is called byggml_backend_amx_mul_mat
function located insrc/ggml-cpu/amx/mmq.cpp
and that only signed-signed int8 operations (_tile_dpbssd
) are supported by that. This is fine, I am able to quantize my model down to int8.To simplify the problem for now, I've created a simple ggml program that does a single 64x64, 64x64 matrix multiplication and added some print statements to
ggml_backend_amx_mul_mat
to check if it's being called but it looks like that function is never called. The note above the definition ofggml_backend_amx_mul_mat
says that src0 must be quantized in some way (I just did fp16 which according to my understanding of the code will execute AVX512 and not AMX, but I'm just trying to figure out how to get this function to execute for now), src1 must be fp32, and destination must be fp32. Despite all this, the function never executes but the program successfully does the matrix multiplication.I've confirmed that ggml was built with AMX and AVX512 support enabled and that the
ggml_cpu_has_amx_int8
,ggml_cpu_has_avx512
,ggml_cpu_has_avx512_vnni
,ggml_cpu_has_avx512_vbmi
, andggml_cpu_has_avx512_bf16
functions all return true. I also explicitly use the cpu backend with the linemodel.backend = ggml_backend_cpu_init();
For reference, here is the entire code. It is based off of this tutorial: https://balisujohn.github.io/converting-pytorch-to-ggml/
Beta Was this translation helpful? Give feedback.
All reactions