how to dump tensor data with data type GGML_TYPE_Q8_0? #7767

jeffzhou2000 · 2024-06-05T10:18:04Z

jeffzhou2000
Jun 5, 2024

Dear GGML community,

I'm a quantize beginner here.

Can anyone(community developer or AI expert) help to explain how to dump data in a 2D ggml tensor with data type GGML_TYPE_Q8_0?

I'm not sure whether my implementation is correct.

I need help from AI expert.Thanks so much.

#define QK8_0 32
typedef struct {
    uint16_t d;       // delta
    int8_t  qs[QK8_0]; // quants
} block_q8_0;
static inline float ggml_compute_fp16_to_fp32(uint16_t  h) {
    __fp16 tmp;
    memcpy(&tmp, &h, sizeof(uint16_t));
    return (float)tmp;
}
#define GGML_FP16_TO_FP32(x) ggml_compute_fp16_to_fp32(x)


if (tensor->type == GGML_TYPE_Q8_0) {
     //is this right?
    block_q8_0 * tmp = ((block_q8_0 *)tensor->data);
    for (int j = 0; j < tensor->ne[1]; j++) {
         const float d = GGML_FP16_TO_FP32(tmp[j].d);
         for (int k = 0; k < 32; k++) {
                float tmpvalue = tmp[j].qs[k] * d;
                //dump tmpvalue
         }
   }
}

[tensor_dump, 315]: dump ggml tensor src0(tensor_0)
[tensor_dump, 320]:            src0: type = 8 ( q8_0) ne =    32 x     4 x     1, nb = (   34,    34,   136)
[tensor_dump_elements, 303]: 
   -0.37     0.83     0.00     0.22     0.48    -0.80    -0.03     0.97     0.59    -0.49     0.06     0.95    -0.20    -0.29     0.92     0.12     0.33    -0.29     0.49     0.50    -0.73    -0.57    -0.98     0.19    -0.63    -0.71     0.56    -0.88     0.83    -0.15     0.61     0.32 
[tensor_dump_elements, 303]: 
   -0.58     0.43     0.72     0.79    -0.93    -0.23    -0.72    -0.88     0.41     0.34    -0.21     0.55    -0.58     0.43     0.30    -0.06     0.13    -0.89     0.37    -0.97     0.02     0.81     0.72    -0.07     0.80     0.14     0.87    -0.16    -0.89     0.41    -0.01    -0.11 
[tensor_dump_elements, 303]: 
    0.10    -0.64     0.96     0.42     0.79    -0.80     0.90     0.36    -0.83     0.26     0.87     0.88    -0.88     0.49    -0.63    -0.49     0.34     0.35     0.93     0.38     0.62    -0.45     0.08    -0.66     0.24     0.20    -0.49     0.35    -0.06    -0.30     0.50    -0.65 
[tensor_dump_elements, 303]: 
    0.82    -0.09    -0.38     0.78    -0.24    -0.22     0.48    -0.78    -0.74    -0.04     0.15    -0.46    -0.05     0.97    -0.03    -0.38    -0.80     0.41     0.45    -0.76    -0.32    -0.95     0.47     0.52     0.57    -0.72     0.96     0.36    -0.47    -0.34    -0.70     0.03 
[tensor_dump_elements, 310]: 
[tensor_dump, 315]: dump ggml tensor src1(tensor_1)
[tensor_dump, 320]:            src1: type = 0 (  f32) ne =    32 x     4 x     1, nb = (    4,   128,   512)
[tensor_dump_elements, 261]: 
   -1.00     0.46    -0.08     0.47    -0.46    -0.16    -0.24     0.63     0.27     0.29     0.20    -0.37     0.07     0.52    -0.24    -0.53     0.21     0.20     0.33     0.07     0.64    -0.01     0.08     0.78     0.10     0.81    -0.24    -0.57    -0.97    -0.22    -0.89     0.93 
[tensor_dump_elements, 261]: 
   -0.55    -0.04     0.19     0.30    -0.51    -0.95     0.04     0.87     0.98    -0.68    -0.17    -0.56    -0.52    -0.26    -0.04    -0.47     0.66    -0.20     0.65    -0.50     0.15    -0.10    -0.36     0.24     0.49     0.96     0.02    -0.61    -0.69    -0.75    -0.66     0.17 
[tensor_dump_elements, 261]: 
   -0.98     0.32     0.44     0.64     0.60    -0.71    -0.18     0.09    -0.27    -0.83     0.81     0.45     0.43     0.32    -0.60    -0.73    -0.83     0.58     0.81     0.54     0.34     0.79     0.69    -0.47     0.00    -0.66    -0.98    -0.85     0.57    -0.84    -0.51    -0.07 
[tensor_dump_elements, 261]: 
    0.37     0.75     0.76    -0.96    -0.45     0.13     0.95     0.95     0.31    -0.59     0.59    -0.53     0.30     0.45    -0.79     0.97     0.60    -0.50     0.18    -0.33     0.88     0.35    -0.39    -0.45     0.45     0.67     0.18    -0.62    -0.13    -0.45     0.93    -0.24 
[tensor_dump_elements, 310]: 
[tensor_dump, 315]: dump ggml tensor dst(tensor_2)
[tensor_dump, 320]:             dst: type = 8 ( q8_0) ne =    32 x     4 x     1, nb = (   34,    34,   136)
[tensor_dump_elements, 303]: 
   -1.36     1.30    -0.08     0.68     0.01    -0.96    -0.26     1.60     0.86    -0.21     0.26     0.58    -0.13     0.23     0.68    -0.42     0.54    -0.09     0.82     0.58    -0.09    -0.58    -0.91     0.96    -0.53     0.10     0.32    -1.45    -0.14    -0.38    -0.28     1.25 
[tensor_dump_elements, 303]: 
   -1.13     0.39     0.91     1.10    -1.43    -1.18    -0.67    -0.01     1.39    -0.34    -0.37    -0.01    -1.10     0.17     0.26    -0.54     0.80    -1.08     1.02    -1.47     0.16     0.71     0.36     0.17     1.29     1.11     0.88    -0.77    -1.58    -0.34    -0.66     0.05 
[tensor_dump_elements, 303]: 
   -0.89    -0.33     1.40     1.05     1.40    -1.51     0.73     0.45    -1.11    -0.58     1.68     1.34    -0.45     0.81    -1.23    -1.22    -0.49     0.93     1.74     0.92     0.96     0.34     0.77    -1.14     0.25    -0.47    -1.48    -0.51     0.51    -1.14    -0.01    -0.71 
[tensor_dump_elements, 303]: 
    1.19     0.66     0.38    -0.18    -0.69    -0.09     1.43     0.17    -0.44    -0.63     0.74    -0.99     0.26     1.42    -0.82     0.58    -0.20    -0.09     0.63    -1.09     0.56    -0.60     0.08     0.07     1.01    -0.04     1.15    -0.26    -0.61    -0.79     0.24    -0.21

Answered by slaren

Jun 5, 2024

This function in ggml-quants.c shows how to convert the data of a Q8_0 tensor to float:
https://github.com/ggerganov/llama.cpp/blob/2b3389677a833cee0880226533a1768b1a9508d2/ggml-quants.c#L1609-L1623

View full answer

slaren · 2024-06-05T12:43:54Z

slaren
Jun 5, 2024
Maintainer

This function in ggml-quants.c shows how to convert the data of a Q8_0 tensor to float:
https://github.com/ggerganov/llama.cpp/blob/2b3389677a833cee0880226533a1768b1a9508d2/ggml-quants.c#L1609-L1623

0 replies

jeffzhou2000 · 2024-06-05T12:46:04Z

jeffzhou2000
Jun 5, 2024
Author

It's my first time to touch the concept of quantize. Thanks so much with sincerely thanks.

Could you help to confirm whether my implementation(the codes is exactly referenced from the place you point out but I really don't understand what's the meaning of "y[i*qk + j] = x[i].qs[j]*d; ") is correct?

2 replies

slaren Jun 5, 2024
Maintainer

Your code is correct if ne[0] == 32, but for larger tensors, they are multiple blocks in a row.

jeffzhou2000 Jun 5, 2024
Author

Thanks so much!

I hardcode the ne[0] to 32 for study the code you point out.
I'll study the case(multiple blocks in a row) you mentioned.

Thanks so much again!

update: I understand what you mentioned now. anyway, thanks for you help and thanks so much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to dump tensor data with data type GGML_TYPE_Q8_0? #7767

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

how to dump tensor data with data type GGML_TYPE_Q8_0? #7767

Uh oh!

Uh oh!

jeffzhou2000 Jun 5, 2024

Replies: 2 comments · 2 replies

Uh oh!

slaren Jun 5, 2024 Maintainer

Uh oh!

Uh oh!

jeffzhou2000 Jun 5, 2024 Author

Uh oh!

slaren Jun 5, 2024 Maintainer

Uh oh!

Uh oh!

jeffzhou2000 Jun 5, 2024 Author

jeffzhou2000
Jun 5, 2024

Replies: 2 comments 2 replies

slaren
Jun 5, 2024
Maintainer

jeffzhou2000
Jun 5, 2024
Author

slaren Jun 5, 2024
Maintainer

jeffzhou2000 Jun 5, 2024
Author