Skip to content

GraniteMoeHybrid (based on v4.51.3)

Compare
Choose a tag to compare
@LysandreJik LysandreJik released this 08 May 13:14
· 282 commits to main since this release
471958b

A new model is added to transformers: GraniteMoeHybrid
It is added on top of the v4.51.3 release, and can be installed from the following tag: v4.51.3-GraniteMoeHybrid-preview.

In order to install this version, please install with the following command:

pip install git+https://github.com/huggingface/transformers@v4.51.3-GraniteMoeHybrid-preview

If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving.

As the tag implies, this tag is a preview of the GraniteMoeHybrid model. This tag is a tagged version of the main branch and does not follow semantic versioning. This model will be included in the next minor release: v4.52.0.

GraniteMoeHybrid

image

The GraniteMoeHybrid model builds on top of GraniteMoeSharedModel and Bamba. Its decoding layers consist of state space layers or MoE attention layers with shared experts. By default, the attention layers do not use positional encoding.

Usage example

GraniteMoeHybrid can be found on the Huggingface Hub.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "ibm-granite/granite-4.0-tiny-preview"
tokenizer = AutoTokenizer.from_pretrained(model_path)

# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
model.eval()

# change input text as desired
prompt = "Write a code to find the maximum value in a list of numbers."

# tokenize the text
input_tokens = tokenizer(prompt, return_tensors="pt")
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
    print(i)