Skip to content

Muon with Llama? #9

@yangsp5

Description

@yangsp5

How can i use Muon with llama model? I run it with Llama, 64 A100

model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b")
grouped_parameters = [
  p for p in model.parameters() if p.requires_grad
]

optimizer = Muon(grouped_parameters)

But it got wrong

[rank3]:   File "/xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/optimizer/Muon.py", line 104, in <listcomp>
[rank3]:     params = [p for p in group['params'] if self.state[p]['use_muon']]
[rank3]: KeyError: 'use_muon'

When I print the params,it seems that the params in self.state not equal group['params']

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions