Skip to content

Commit 1288956

Browse files
depeng1994wangxiaoxin (A)
authored andcommitted
provide an e2e guide for execute duration profiling (#1113)
### What this PR does / why we need it? provide an e2e guide for execute duration profiling Signed-off-by: depeng1994 <depengzhang@foxmail.com>
1 parent 5ffdb6b commit 1288956

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

docs/source/developer_guide/evaluation/profile_execute_duration.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ The execution duration of each stage (including pre/post-processing, model forwa
99
* Use the non-blocking API `ProfileExecuteDuration().capture_async` to set observation points asynchronously when you need to observe the execution duration.
1010
* Use the blocking API `ProfileExecuteDuration().pop_captured_sync` at an appropriate time to get and print the execution durations of all observed stages.
1111

12+
**We have instrumented the key inference stages (including pre-processing, model forward pass, etc.) for execute duration profiling. Execute the script as follows:**
13+
```
14+
VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE=1 python3 vllm-ascend/examples/offline_inference_npu.py
15+
```
16+
1217
## Example Output
1318

1419
```

vllm_ascend/worker/model_runner_v1.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1066,8 +1066,8 @@ def execute_model(
10661066
for tag, duration in durations.items()
10671067
]
10681068
captured_name = "Decode" if self.attn_state == AscendAttentionState.DecodeOnly else "Prefill"
1069-
print(f"Profile execute duration [{captured_name}]:",
1070-
" ".join(dr_str))
1069+
logger.info("Profile execute duration [%s]:%s", captured_name,
1070+
" ".join(dr_str))
10711071

10721072
return model_runner_output
10731073

0 commit comments

Comments
 (0)