-
Notifications
You must be signed in to change notification settings - Fork 12.4k
convert-hf : support bfloat16 conversion #7158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
6f8d280
convert-hf : support bfloat16 conversion
compilade 59f5a27
gguf-py : flake8 fixes
compilade 3801db1
convert-hf : add missing space after comma
compilade 95930da
convert-hf : get bit-exact same output as ./quantize
compilade 58b515c
convert-hf : add --outtype auto-f16
compilade d3d32a6
convert-hf : remove a semicolon because flake8 doesn't like it
compilade e0af2df
convert-hf : support outtype templating in outfile name
compilade File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given most models come in bf16 wouldn't it make sense to set it as default?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion to
bf16
is slightly slower and uses a bit more RAM thanf16
conversion, due to the lack of native Numpy support, so I didn't change the default.I'll see if I can auto-detect whether the model contains
bf16
tensors (but it will most likely be too complicated). Otherwise, it does make sense to setbf16
as default if it's widely used.My concern with
bf16
as default is thatf16 -> bf16
is more lossy thanbf16 -> f16
, since 3 bits of the mantissa are always lost inf16 -> bf16
, whilebf16 -> f16
only turns some very-close-to-zero values into zero, and big values get turned toinf
(but such big values are usually not in model weights, see #7075 (comment)).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have rudimentary CPU-only
bf16
support inggml
, sof16
is better for nowThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(left comment on wrong account)
@compilade could we not attempt to read from config.json? it should have a torch_dtype in it
Ggerganov, when you say CPU-only, I assume you're referring to inference, since all conversion and quantization is currently CPU-only?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartowski1182 yes, but not all models define that field, so I think a second guess based on the type of the first tensor type in the model will sometimes be necessary.
config.json
and use another type in the model files. For example, https://huggingface.co/pansophic/rocket-3B hasF16
tensors, but definestorch_dtype
asbfloat16
inconfig.json
Would you still like some kind of
--outtype auto-f16
based on the model content even iff16
is kept as the default--outtype
otherwise? (due to (slightly) faster conversion, and more complete backend support)Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htdung167 This PR is only about conversion, and the convert script always has been CPU-only.
Inference (text generation) with
bf16
was worked on by @jart in #6412. A relevant comment from there regarding future GPU support would be #6412 (comment).@bartowski1182 I agree, I've thought about this more, and I'll change this to
auto
instead ofauto-f16
. I don't think there will beauto-anything-else
anyway.yes, this is already possible by checking the logs. It would also be possible to do automatically, but not everyone has the same naming conventions, so maybe the right way to do this would be with a
.format()
pattern? For example,--outfile llama-3-8b-instruct-{ftype}.gguf
, or--outfile llama-3-8b-instruct-{outtype}.gguf
or--outfile llama-3-8b-instruct-{}.gguf
. Not sure which to support (all?), but it should be clearer than%s
. It would also be possible to allow using{FTYPE}
or{OUTTYPE}
for upper-cased type names.I see you've used
fp16
andfp32
in the past, but this will usef16
andf32
, respectively, for these type names.(EDIT: this is now implemented in e0af2df)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i've made the transition to f16/f32, was an oversight from me naming them 'fp'
a format option would be amazing
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should
auto
simply try to be as lossless as possible? Like, if the model is originally inf32
, make the outputf32
? Or should it always select a 16-bit type? (currentlybf16
is selected forf32
models)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote would be on compressing even if originally it was f32, if the person covering wants f32 they'll specify, otherwise presumably they're always converting with the intention of quantizing where it won't matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Do no harm" should be the default.