Skip to content

Vectors are always converted to F32 when running convert.py #6497

Answered by ggerganov
kunnis asked this question in Q&A
Discussion options

You must be logged in to vote

The reason to keep and/or cast the 1D tensors (i.e. vectors) in F32 format is because they are very small compared to all other 2D tensors in the models. The performance difference between having F16 vs F32 1D tensors will be negligible (except for some very small models probably). Therefore it is easier to have a single F32 implementation of the respective operators (ggml_scale, etc.) and keep the data with the highest precision. In the future this can be extended to support F16 vectors, but a first big step before that is adding support for F16 output - currently (almost) all ggml operators produce the result in F32 format

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@kunnis
Comment options

@sorasoras
Comment options

Answer selected by kunnis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants