Learning rate compared to adamw

Hi! I see the example code provides 67x increase of learning rate for muon params vs. adamw params. Is this a reasonable heuristic for translating previously used adamw learning rate to now using muon for training?

Thank you,

/ David