Skip to content

GLM-4 Update #20736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
e11b28a
init
zRzRzRzRzRzRzR Jul 10, 2025
818db59
ovis aimv2 model type need changed, mark
zRzRzRzRzRzRzR Jul 10, 2025
c6b8eb6
format
zRzRzRzRzRzRzR Jul 10, 2025
f406a09
format
zRzRzRzRzRzRzR Jul 10, 2025
efcff2b
use ds loading(not work)
zRzRzRzRzRzRzR Jul 10, 2025
e6ad576
update
zRzRzRzRzRzRzR Jul 10, 2025
d49c5bc
update 1
zRzRzRzRzRzRzR Jul 10, 2025
c8ad31e
test
zRzRzRzRzRzRzR Jul 10, 2025
2440756
Update glm4_moe.py
zRzRzRzRzRzRzR Jul 10, 2025
5e38b14
update
zRzRzRzRzRzRzR Jul 10, 2025
c227471
Update glm4_moe.py
zRzRzRzRzRzRzR Jul 10, 2025
9f3ab70
1
zRzRzRzRzRzRzR Jul 11, 2025
8feace6
use ds imp
zRzRzRzRzRzRzR Jul 11, 2025
a875d98
Merge branch 'vllm-project:main' into glm-moe
zRzRzRzRzRzRzR Jul 11, 2025
12666d3
update and merge
zRzRzRzRzRzRzR Jul 11, 2025
ffe9d62
fix partial_rotary_factor
Isotr0py Jul 12, 2025
19de827
Merge branch 'vllm-project:main' into glm-moe
zRzRzRzRzRzRzR Jul 13, 2025
dbf1719
Update for doc
zRzRzRzRzRzRzR Jul 13, 2025
45f70e6
Merge branch 'vllm-project:main' into glm-moe
zRzRzRzRzRzRzR Jul 15, 2025
0ea4b99
update for GLM MPT draft
zRzRzRzRzRzRzR Jul 15, 2025
1c15513
MTP and main model are diff, add error msgs
zRzRzRzRzRzRzR Jul 15, 2025
744e071
Update benchmark_moe.py
zRzRzRzRzRzRzR Jul 15, 2025
824170c
Merge branch 'vllm-project:main' into glm-moe
zRzRzRzRzRzRzR Jul 15, 2025
5ab9ee6
Update benchmark_moe_permute_unpermute.py
zRzRzRzRzRzRzR Jul 15, 2025
0a60eed
Merge branch 'vllm-project:main' into glm-moe
zRzRzRzRzRzRzR Jul 15, 2025
2805cc3
use transformers name
zRzRzRzRzRzRzR Jul 16, 2025
2057e34
Merge branch 'vllm-project:main' into glm-moe
zRzRzRzRzRzRzR Jul 16, 2025
25edf51
Update config.py
zRzRzRzRzRzRzR Jul 16, 2025
b94241e
1
zRzRzRzRzRzRzR Jul 16, 2025
efeb607
update
zRzRzRzRzRzRzR Jul 16, 2025
64bbe48
Update config.py
zRzRzRzRzRzRzR Jul 16, 2025
078e2ef
Update glm4_moe.py
zRzRzRzRzRzRzR Jul 17, 2025
7d4340f
all use Glm4MoeMTPModel
zRzRzRzRzRzRzR Jul 17, 2025
a80072e
remove model prefix
zRzRzRzRzRzRzR Jul 17, 2025
4277658
update
zRzRzRzRzRzRzR Jul 17, 2025
4bb6b27
FC support
zRzRzRzRzRzRzR Jul 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 8 additions & 16 deletions benchmarks/kernels/benchmark_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -563,22 +563,14 @@ def main(args: argparse.Namespace):
if args.model_prefix:
config = getattr(config, args.model_prefix)

if config.architectures[0] == "DbrxForCausalLM":
E = config.ffn_config.moe_num_experts
topk = config.ffn_config.moe_top_k
intermediate_size = config.ffn_config.ffn_hidden_size
shard_intermediate_size = 2 * intermediate_size // args.tp_size
elif config.architectures[0] == "JambaForCausalLM":
E = config.num_experts
topk = config.num_experts_per_tok
intermediate_size = config.intermediate_size
shard_intermediate_size = 2 * intermediate_size // args.tp_size
elif config.architectures[0] in ("DeepseekV3ForCausalLM", "DeepseekV2ForCausalLM"):
E = config.n_routed_experts
topk = config.num_experts_per_tok
intermediate_size = config.moe_intermediate_size
shard_intermediate_size = 2 * intermediate_size // args.tp_size
elif config.architectures[0] in ("Qwen2MoeForCausalLM", "Qwen3MoeForCausalLM"):
if config.architectures[0] in (
"DbrxForCausalLM",
"JambaForCausalLM",
"DeepseekV3ForCausalLM",
"DeepseekV2ForCausalLMQwen2MoeForCausalLM",
"Qwen3MoeForCausalLM",
"Glm4MoeForCausalLM",
):
E = config.num_experts
topk = config.num_experts_per_tok
intermediate_size = config.moe_intermediate_size
Expand Down
1 change: 1 addition & 0 deletions tests/models/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,7 @@ def check_available_online(
trust_remote_code=True,
hf_overrides={"architectures": ["GLM4VForCausalLM"]}), # noqa: E501
"Glm4vForConditionalGeneration": _HfExamplesInfo("THUDM/GLM-4.1V-9B-Thinking", min_transformers_version="4.53"), # noqa: E501
"Glm4MoeForCausalLM": _HfExamplesInfo("/model/GLM-4-MoE-100B-A10B", min_transformers_version="4.54"), # noqa: E501
"H2OVLChatModel": _HfExamplesInfo("h2oai/h2ovl-mississippi-800m",
extras={"2b": "h2oai/h2ovl-mississippi-2b"}, # noqa: E501
max_transformers_version="4.48", # noqa: E501
Expand Down
Loading
Loading