Skip to content

Add granite #1882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add granite #1882

wants to merge 1 commit into from

Conversation

BBC-Esq
Copy link

@BBC-Esq BBC-Esq commented Apr 12, 2025

Should add the really cool granite models by IBM.

@BBC-Esq BBC-Esq changed the title add granite Add granite Apr 12, 2025
Copy link

@gabe-l-hart gabe-l-hart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting together this PR for Granite! I'm one of the Granite team leads for open source integrations, so I've got a couple of notes inline. Also, in addition to pulling head_dim from config and applying the embedding_multiplier, there are three more scalars that are used in the GraniteForCausalLM architecture that differentiate it from llama:

  • attention_multiplier: This should be applied as a scaling after the self attention block of each layer (here)
  • residual_multiplier: This should be applied when recombining with the residual after the self attention block (here) and the FFN (here) in each layer
  • logits_scaling: This should be applied to the output logits before returning them from the model forward pass (here)

rotary_interleave=False,
rotary_base=getattr(model.config, "rope_theta", 5000000.0),
num_heads_kv=num_heads_kv,
head_dim=model.config.hidden_size // model.config.num_attention_heads,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Granite, the head_dim can be pulled directly from config with the fallback being this calculation (here)

self.set_linear(spec.decoder.projection, model.lm_head)

if hasattr(model.config, "embedding_multiplier") and model.config.embedding_multiplier:
spec.decoder.embeddings.multiply_by_sqrt_depth = model.config.embedding_multiplier

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without unboxing the repo further, I can't tell exactly whether this is the right place for embedding_multipler, but it looks right. It should simply be applied as a scaling after the embeddings are computed (here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants