Skip to content

What is the principle of quantize(eg. Q4_0)? #3546

Answered by KerfuffleV2
Tr-buaa asked this question in Q&A
Discussion options

You must be logged in to vote

The purpose of course is to try to use less storage space/memory (it also can speed up running the model). The principle is to try to fit a large range of values into a smaller range while preserving as much accuracy as possible.

Not sure if it'll help you, but I wrote a pretty simple Python implementation of q8_0 a while back:

import numpy as np

#### Mini Q8_0 quantization in Python
QK8_0 = 32
BLOCK_Q8_0 = np.dtype([('d', '<f2'), ('qs', 'i1', (QK8_0,))])
def quantize_array_q8_0(arr):
    assert arr.size % QK8_0 == 0 and arr.size != 0, f'Bad array size {arr.size}'
    assert arr.dtype == np.float32, f'Bad array type {arr.dtype}'
    n_blocks = arr.size // QK8_0
    blocks = arr.reshape((n…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@mmccqq
Comment options

Answer selected by staviq
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #3541 on October 08, 2023 13:38.