-
Notifications
You must be signed in to change notification settings - Fork 12.4k
llama: add initial support for Falcon-H1 model family #14534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+585
−9
Merged
Changes from 70 commits
Commits
Show all changes
112 commits
Select commit
Hold shift + click to select a range
991de6c
v1
younesbelkada f897efd
push more fixes
younesbelkada 71a6848
another fix
younesbelkada 03568c9
fix
younesbelkada 0c93ef6
more fixes
younesbelkada fdd5cff
minor fix
younesbelkada 14c37ec
more cleaning on python code
younesbelkada 8bea922
python fixes
ibrahimkhadraoui 071f4b7
changed precision for multipliers float 32->64
ibrahimkhadraoui 50eadc7
fixes
younesbelkada a39a842
merge
younesbelkada 1415cd8
another fix
younesbelkada 243e4d1
fix
younesbelkada cce3549
pre-norm -> norm
younesbelkada 22de62c
fix
younesbelkada 2fe057c
Revert "fix"
ibrahimkhadraoui d22b4ea
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…
ibrahimkhadraoui 6c7d9e2
fix
younesbelkada 15138df
small fix ffn_norm
ibrahimkhadraoui a6d0067
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…
ibrahimkhadraoui 1fd0574
try
younesbelkada 250b4f1
mix instead of max
younesbelkada 3ee7983
fix vocab size
ibrahimkhadraoui 2aa48dd
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…
ibrahimkhadraoui 9760c8b
conflict solve
ibrahimkhadraoui 7a25441
fixed multipliers
ibrahimkhadraoui 280dd2d
falcon-h1 specefic vocab resolved
ibrahimkhadraoui c56ec07
read arch from gguf.MODEL_ARCH
ibrahimkhadraoui c4af0f3
mamba_d_ssm added to d_inner find_hparam
ibrahimkhadraoui 53304c8
remove unused functions from gguf_writer.py
ibrahimkhadraoui 441d8d6
override modify_tensors instead of get_tensors
ibrahimkhadraoui 6c39e77
fix conversion and d_inner
younesbelkada 8c50893
added some cb functions for debugging puposes
ibrahimkhadraoui 49d7420
inp_out_ids moved outside of layers loop
ibrahimkhadraoui 97011d7
mup_vec create as float64
ibrahimkhadraoui 286e1fa
fix rope_theta
ibrahimkhadraoui b3bc1fb
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…
ibrahimkhadraoui a9f3a63
injected mup
younesbelkada e96cc73
clean ups
younesbelkada 3afb2a8
Merge pull request #1 from tiiuae/injected-mup
ibrahimkhadraoui 0ad3502
rm extra space
ibrahimkhadraoui 53446f7
rm unused MAMBA_CHUNK_SIZE
ibrahimkhadraoui ae937f4
rm unused key
ibrahimkhadraoui b6df0a4
add bos False
ibrahimkhadraoui 935d46f
changed ROPE_TYPE
ibrahimkhadraoui 624699c
cleaning debugging stuff
ibrahimkhadraoui 042e5ff
cleaning debug quant
ibrahimkhadraoui f74e266
fix comment
younesbelkada 632861e
some cleanups
younesbelkada 084873c
some cleanups
younesbelkada fd20330
Update src/llama-model-loader.cpp
younesbelkada 68cb784
more cleanups
younesbelkada d2f46f1
moe cleanuips
younesbelkada 7d7da0b
d_ssm -> d_inner;
younesbelkada 67b2664
cleaning unused hparams
ibrahimkhadraoui da8a338
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…
ibrahimkhadraoui e63ee46
cleanup
ibrahimkhadraoui d473d42
more cleanups
younesbelkada 8555ee8
more cleanups on python conversion;
younesbelkada 7846c67
minor cleanups
ibrahimkhadraoui 2dee7cf
Apply suggestions from code review
younesbelkada a846d02
remove todo
younesbelkada f028a43
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…
ibrahimkhadraoui d41f111
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…
ibrahimkhadraoui f266d14
added falcon-h1
ibrahimkhadraoui 4bc9e0c
tensor not required
younesbelkada 2834a4a
clean
ibrahimkhadraoui 823696b
remove unneeded attributes
younesbelkada adff470
more cleanups and fixed conversion
younesbelkada 097df0e
remove final_norm
younesbelkada 9a048d8
flake8 fixes
ibrahimkhadraoui 52d1ef3
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…
ibrahimkhadraoui 58e3866
Update src/llama-model.cpp
younesbelkada d28c31a
Merge branch 'master' into add-fh1-rebased
younesbelkada 9b92648
flake8 fixes
ibrahimkhadraoui 7fe1794
Update src/llama-hparams.cpp
ibrahimkhadraoui 40058c0
Update src/llama-model.cpp
ibrahimkhadraoui debf4e5
Update src/llama-model.cpp
ibrahimkhadraoui 212edff
Update src/llama-arch.cpp
ibrahimkhadraoui 90ddf24
Update convert_hf_to_gguf.py
ibrahimkhadraoui 7edf380
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…
ibrahimkhadraoui c3c5d51
added hashes
ibrahimkhadraoui f8d7c97
Update src/llama-arch.cpp
younesbelkada 4610ee2
Update src/llama-vocab.cpp
younesbelkada 082ab4a
update the update file
younesbelkada c5515e3
Revert "update the update file"
younesbelkada 1ef53b3
fix: address suggestions
younesbelkada d5efbd0
fix: update convert_hf_to_gguf.py
younesbelkada a5afc8b
Update gguf-py/gguf/constants.py
younesbelkada 99f9a3d
Update src/llama-model-loader.cpp
younesbelkada c3c64c3
d_inner fixed
ibrahimkhadraoui 63e3afc
Update src/llama-model.cpp
younesbelkada d758578
reshaping ssm_norm for 34B
ibrahimkhadraoui 8972c15
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp…
ibrahimkhadraoui 7897c21
removing generate_mup
ibrahimkhadraoui 6403caa
remove duplicates metadata keys
ibrahimkhadraoui 710630a
rm comment
ibrahimkhadraoui 7b9aa7b
Merge branch 'master' into add-fh1-rebased
younesbelkada ecc5253
final comment
younesbelkada bbca33e
fix unused args
younesbelkada 9f514e3
fix constants
younesbelkada 34c5d83
fix bad merge
younesbelkada 521e823
Update src/llama-model.cpp
younesbelkada 6943f4e
falcon-h1: remove unused ssm_in_b and bad merge
younesbelkada 4d2c94b
Update src/llama-model.cpp
younesbelkada b7c9a99
falcon-h1: fix last comment
younesbelkada 9fd308d
Update convert_hf_to_gguf.py
younesbelkada 51f50bf
falcon-h1: revert add_add_bos(False)
younesbelkada 367d8c5
falcon-h1: fix tied weights
younesbelkada 1fa361b
falcon-h1: remove whitespace
younesbelkada 6dde986
falcon-h1: fix wrong size param
younesbelkada 94ab3a8
falcon-h1: fix whitespace issues
younesbelkada File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -172,6 +172,7 @@ class SSM: | |
TIME_STEP_RANK = "{arch}.ssm.time_step_rank" | ||
GROUP_COUNT = "{arch}.ssm.group_count" | ||
DT_B_C_RMS = "{arch}.ssm.dt_b_c_rms" | ||
HEAD_DIM = "{arch}.ssm.head_dim" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The head count in Mamba-2 is also the time step rank. I guess it could be clearer to use a more appropriate name like this, though. I'm not against, this is only to at least let you know. |
||
|
||
class WKV: | ||
HEAD_SIZE = "{arch}.wkv.head_size" | ||
|
@@ -288,6 +289,7 @@ class MODEL_ARCH(IntEnum): | |
LLAMA4 = auto() | ||
DECI = auto() | ||
FALCON = auto() | ||
FALCON_H1 = auto() | ||
BAICHUAN = auto() | ||
GROK = auto() | ||
GPT2 = auto() | ||
|
@@ -660,6 +662,7 @@ class MODEL_TENSOR(IntEnum): | |
MODEL_ARCH.DOTS1: "dots1", | ||
MODEL_ARCH.ARCEE: "arcee", | ||
MODEL_ARCH.ERNIE4_5: "ernie4_5", | ||
MODEL_ARCH.FALCON_H1: "falcon_h1", | ||
younesbelkada marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
VISION_PROJECTOR_TYPE_NAMES: dict[VISION_PROJECTOR_TYPE, str] = { | ||
|
@@ -2211,6 +2214,40 @@ class MODEL_TENSOR(IntEnum): | |
MODEL_TENSOR.FFN_DOWN, | ||
MODEL_TENSOR.FFN_UP, | ||
], | ||
MODEL_ARCH.FALCON_H1: [ | ||
# Token embedding | ||
MODEL_TENSOR.TOKEN_EMBD, | ||
|
||
# Input layernorm | ||
MODEL_TENSOR.ATTN_NORM, | ||
|
||
# Attention components | ||
MODEL_TENSOR.ATTN_Q, # Query projection | ||
MODEL_TENSOR.ATTN_K, # Key projection | ||
MODEL_TENSOR.ATTN_V, # Value projection | ||
MODEL_TENSOR.ATTN_OUT, # Output projection | ||
|
||
# SSM components (Mamba2 specific) | ||
MODEL_TENSOR.SSM_IN, # Input projection for SSM | ||
MODEL_TENSOR.SSM_CONV1D, # Convolution layer | ||
MODEL_TENSOR.SSM_DT, # Delta time projection | ||
MODEL_TENSOR.SSM_A, # A parameter (log form) | ||
MODEL_TENSOR.SSM_D, # D parameter | ||
MODEL_TENSOR.SSM_NORM, # Normalization in SSM | ||
MODEL_TENSOR.SSM_OUT, # Output projection | ||
|
||
# Pre-feedforward layernorm | ||
MODEL_TENSOR.FFN_PRE_NORM, | ||
|
||
# Feed-forward network components | ||
MODEL_TENSOR.FFN_GATE, # Gate projection (SwiGLU) | ||
MODEL_TENSOR.FFN_DOWN, # Down projection | ||
MODEL_TENSOR.FFN_UP, # Up projection | ||
|
||
# Post-feedforward layernorm | ||
MODEL_TENSOR.OUTPUT_NORM, # Final layer norm | ||
MODEL_TENSOR.OUTPUT, # Output projection (lm_head) | ||
], | ||
# TODO | ||
} | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have multiple hashes here?
This section should be generated by the
convert_hf_to_gguf_update.py
script and it will be overwritten the next time we run it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we have multiple hashes here is that we use different tokenizers for each model size, hence leading to getting different hash for each size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to add all the models in the update script?
The idea is to not edit this block manually, because it will eventually get overwritten when the update script is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried quickly to adapt that script and got this diff:

We'll probably need to register one different name per model size (4 in total). We are not sure what is the preferred approach for llama.cpp, if that's an approach we want to do it's fine and we'll update it, otherwise we can add a comment explaining why we have 4 hashes here and add it by hand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generally fine to make this change.
Just to make sure that we are on the same page - getting different hashes here generally means that either:
The second option is OK. However, the first option is not OK.
I haven't looked what is the case for Falcon-H1, but if you can confirm that the reason for the different hashes is the second case (i.e. due to different tokens being present for the different models, but they in fact use the same tokenizer), we should be good.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for explaining this - I can confirm it's the second case - we use the same tokenization algorithm (BPE) but the vocab size and the tokens inside the vocabulary for each model size is different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, about the removed lines in the diff: do not commit those. They are missing most likely because you don't have access to the respective HF repos. Only commit the new lines for Falcon-H1