MERIT Optimizer

ICML 2025 | Paper

This is an official implementation of MERIT optimizer in the "MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training". Please cite the paper and star this repo if you find MERIT useful. Thanks!

Method

MERIT is a novel optimizer that leverages the max-norm to calculate the trust ratio, effectively constraining the maximum attention logit. Furthermore, MERIT constructs element-wise trust ratios to enable more robust update scaling by focusing on local weight structures.

Visualization

Compared to LAMB and AdamW, MERIT better controls the maximum attention logit.

Usage

from optim.merit import MERIT
optimizer = MERIT(
    model.parameters(),
    lr=2e-4,
    weight_decay=1e-2,
    betas=(0.9, 0.95),
)

Citation

@inproceedings{luo2025merit,
title={{MERIT}: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training},
author={Yang Luo and Zangwei Zheng and Ziheng Qin and Zirui Zhu and Yong Liu and Yang You},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=NSxKNNFni0}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
figs		figs
optim		optim
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MERIT Optimizer

ICML 2025 | Paper

Method

Visualization

Usage

Citation

About

Uh oh!

Releases

Packages

Languages

License

NUS-HPC-AI-Lab/MERIT

Folders and files

Latest commit

History

Repository files navigation

MERIT Optimizer

ICML 2025 | Paper

Method

Visualization

Usage

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages