Skip to content

Commit 0fad10b

Browse files
authored
[Executor] CUDA Graph support padding batch (#2844)
* cuda graph support padding batch * Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes. * Do not insert max_num_seqs when the user specifies a capture list * Support set graph optimization config from YAML file * update cuda graph ci * fix ci bug * fix ci bug
1 parent 61b3997 commit 0fad10b

30 files changed

+292
-226
lines changed

benchmarks/yaml/eb45t_0dot3b-32k-bf16-a30-tp1-static.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@ max_model_len: 32768
22
max_num_seqs: 128
33
kv_cache_ratio: 0.75
44
tensor_parallel_size: 1
5-
enable_static_graph_inference: True
5+
graph_optimization_config:
6+
graph_opt_level: 1

benchmarks/yaml/eb45t_0dot3b-32k-bf16-h800-tp1-static.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@ max_model_len: 32768
22
max_num_seqs: 128
33
kv_cache_ratio: 0.75
44
tensor_parallel_size: 1
5-
enable_static_graph_inference: True
5+
graph_optimization_config:
6+
graph_opt_level: 1

benchmarks/yaml/eb45t_0dot3b-32k-wint8-a30-tp1-static.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ max_num_seqs: 128
33
kv_cache_ratio: 0.75
44
tensor_parallel_size: 1
55
quantization: wint8
6-
enable_static_graph_inference: True
6+
graph_optimization_config:
7+
graph_opt_level: 1

benchmarks/yaml/eb45t_0dot3b-32k-wint8-h800-tp1-static.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ max_num_seqs: 128
33
kv_cache_ratio: 0.75
44
tensor_parallel_size: 1
55
quantization: wint8
6-
enable_static_graph_inference: True
6+
graph_optimization_config:
7+
graph_opt_level: 1

benchmarks/yaml/eb45t_21b-32k-bf16-h800-tp1-static.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@ max_model_len: 32768
22
max_num_seqs: 128
33
kv_cache_ratio: 0.75
44
tensor_parallel_size: 1
5-
enable_static_graph_inference: True
5+
graph_optimization_config:
6+
graph_opt_level: 1

benchmarks/yaml/eb45t_21b-32k-wint4-h800-tp1-static.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ max_num_seqs: 128
33
kv_cache_ratio: 0.75
44
tensor_parallel_size: 1
55
quantization: wint4
6-
enable_static_graph_inference: True
6+
graph_optimization_config:
7+
graph_opt_level: 1

benchmarks/yaml/eb45t_300b-32k-wint4-h800-tp4-static.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ max_num_seqs: 96
33
gpu_memory_utilization: 0.9
44
kv_cache_ratio: 0.71
55
tensor_parallel_size: 4
6-
enable_static_graph_inference: True
6+
graph_optimization_config:
7+
graph_opt_level: 1

benchmarks/yaml/qwen2_7b-32k-bf16-a30-tp1-static.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@ max_model_len: 32768
22
max_num_seqs: 128
33
kv_cache_ratio: 0.75
44
tensor_parallel_size: 1
5-
enable_static_graph_inference: True
5+
graph_optimization_config:
6+
graph_opt_level: 1

benchmarks/yaml/qwen2_7b-32k-bf16-h800-tp1-static.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@ max_model_len: 32768
22
max_num_seqs: 128
33
kv_cache_ratio: 0.75
44
tensor_parallel_size: 1
5-
enable_static_graph_inference: True
5+
graph_optimization_config:
6+
graph_opt_level: 1

benchmarks/yaml/qwen2_7b-32k-fp8-h800-tp1-static.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ max_num_seqs: 128
33
kv_cache_ratio: 0.75
44
tensor_parallel_size: 1
55
quantization: wfp8afp8
6-
enable_static_graph_inference: True
6+
graph_optimization_config:
7+
graph_opt_level: 1

0 commit comments

Comments
 (0)