provide an e2e guide for execute duration profiling (#1113)

depeng1994 · wangxiaoxin (A) · commit 128895683eee · 2025-06-17T10:20:59.000+08:00
### What this PR does / why we need it?
provide an e2e guide for execute duration profiling


Signed-off-by: depeng1994 &lt;depengzhang@foxmail.com&gt;
diff --git a/docs/source/developer_guide/evaluation/profile_execute_duration.md b/docs/source/developer_guide/evaluation/profile_execute_duration.md
@@ -9,6 +9,11 @@ The execution duration of each stage (including pre/post-processing, model forwa
 * Use the non-blocking API `ProfileExecuteDuration().capture_async` to set observation points asynchronously when you need to observe the execution duration.
 * Use the blocking API `ProfileExecuteDuration().pop_captured_sync` at an appropriate time to get and print the execution durations of all observed stages.
 
+**We have instrumented the key inference stages (including pre-processing, model forward pass, etc.) for execute duration profiling. Execute the script as follows:**
+```
+VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE=1 python3 vllm-ascend/examples/offline_inference_npu.py
+```
+
 ## Example Output
 
 ```
diff --git a/vllm_ascend/worker/model_runner_v1.py b/vllm_ascend/worker/model_runner_v1.py
@@ -1066,8 +1066,8 @@ def execute_model(
                 for tag, duration in durations.items()
             ]
             captured_name = "Decode" if self.attn_state == AscendAttentionState.DecodeOnly else "Prefill"
-            print(f"Profile execute duration [{captured_name}]:",
-                  " ".join(dr_str))
+            logger.info("Profile execute duration [%s]:%s", captured_name,
+                        " ".join(dr_str))
 
         return model_runner_output
 

Original file line number	Diff line number	Diff line change
`@@ -1066,8 +1066,8 @@ def execute_model(`
`1066`	`1066`	`for tag, duration in durations.items()`
`1067`	`1067`	`]`
`1068`	`1068`	`captured_name = "Decode" if self.attn_state == AscendAttentionState.DecodeOnly else "Prefill"`
`1069`		`- print(f"Profile execute duration [{captured_name}]:",`
`1070`		`- " ".join(dr_str))`
	`1069`	`+ logger.info("Profile execute duration [%s]:%s", captured_name,`
	`1070`	`+ " ".join(dr_str))`
`1071`	`1071`
`1072`	`1072`	`return model_runner_output`
`1073`	`1073`