Skip to content

Commit ff1c674

Browse files
authored
test: Test and document histogram latency metrics (#7694)
1 parent 9ff7d80 commit ff1c674

File tree

7 files changed

+496
-2
lines changed

7 files changed

+496
-2
lines changed

docs/user_guide/metrics.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,43 @@ metrics are used for latencies:
204204

205205
To disable these metrics specifically, you can set `--metrics-config counter_latencies=false`
206206

207+
#### Histograms
208+
209+
> **Note**
210+
>
211+
> The following Histogram feature is experimental for the time being and may be
212+
> subject to change based on user feedback.
213+
214+
By default, the following
215+
[Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram)
216+
metrics are used for latencies:
217+
218+
|Category |Metric |Metric Name |Description |Granularity|Frequency |Model Type
219+
|--------------|----------------|------------|---------------------------|-----------|-------------|-------------|
220+
|Latency |Request to First Response Time |`nv_inference_first_response_histogram_ms` |Histogram of end-to-end inference request to the first response time |Per model |Per request | Decoupled |
221+
222+
To enable these metrics specifically, you can set `--metrics-config histogram_latencies=true`
223+
224+
Each histogram above is composed of several sub-metrics. For each histogram
225+
metric, there is a set of `le` (less than or equal to) thresholds tracking
226+
the counter for each bucket. Additionally, there are `_count` and `_sum`
227+
metrics that aggregate the count and observed values for each. For example,
228+
see the following information exposed by the "Time to First Response" histogram
229+
metrics:
230+
```
231+
# HELP nv_first_response_histogram_ms Duration from request to first response in milliseconds
232+
# TYPE nv_first_response_histogram_ms histogram
233+
nv_inference_first_response_histogram_ms_count{model="my_model",version="1"} 37
234+
nv_inference_first_response_histogram_ms_sum{model="my_model",version="1"} 10771
235+
nv_inference_first_response_histogram_ms{model="my_model",version="1", le="100"} 8
236+
nv_inference_first_response_histogram_ms{model="my_model",version="1", le="500"} 30
237+
nv_inference_first_response_histogram_ms{model="my_model",version="1", le="2000"} 36
238+
nv_inference_first_response_histogram_ms{model="my_model",version="1", le="5000"} 37
239+
nv_inference_first_response_histogram_ms{model="my_model",version="1", le="+Inf"} 37
240+
```
241+
242+
Triton initializes histograms with default buckets for each, as shown above. Customization of buckets per metric is currently unsupported.
243+
207244
#### Summaries
208245

209246
> **Note**
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
import time
28+
29+
import numpy as np
30+
import triton_python_backend_utils as pb_utils
31+
32+
33+
class TritonPythonModel:
34+
async def execute(self, requests):
35+
request = requests[0]
36+
wait_secs = pb_utils.get_input_tensor_by_name(
37+
request, "WAIT_SECONDS"
38+
).as_numpy()[0]
39+
response_num = pb_utils.get_input_tensor_by_name(
40+
request, "RESPONSE_NUM"
41+
).as_numpy()[0]
42+
output_tensors = [
43+
pb_utils.Tensor("WAIT_SECONDS", np.array([wait_secs], np.float32)),
44+
pb_utils.Tensor("RESPONSE_NUM", np.array([1], np.uint8)),
45+
]
46+
47+
# Wait
48+
time.sleep(wait_secs.item())
49+
response_sender = request.get_response_sender()
50+
for i in range(response_num):
51+
response = pb_utils.InferenceResponse(output_tensors)
52+
if i != response_num - 1:
53+
response_sender.send(response)
54+
else:
55+
response_sender.send(
56+
response, flags=pb_utils.TRITONSERVER_RESPONSE_COMPLETE_FINAL
57+
)
58+
59+
return None
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
backend: "python"
28+
input [
29+
{
30+
name: "WAIT_SECONDS"
31+
data_type: TYPE_FP32
32+
dims: [ 1 ]
33+
},
34+
{
35+
name: "RESPONSE_NUM"
36+
data_type: TYPE_UINT8
37+
dims: [ 1 ]
38+
}
39+
]
40+
output [
41+
{
42+
name: "WAIT_SECONDS"
43+
data_type: TYPE_FP32
44+
dims: [ 1 ]
45+
},
46+
{
47+
name: "RESPONSE_NUM"
48+
data_type: TYPE_UINT8
49+
dims: [ 1 ]
50+
}
51+
]
52+
53+
instance_group [{ kind: KIND_CPU }]
54+
model_transaction_policy { decoupled: True }
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
name: "ensemble"
28+
platform: "ensemble"
29+
input [
30+
{
31+
name: "INPUT0"
32+
data_type: TYPE_FP32
33+
dims: [ 1 ]
34+
},
35+
{
36+
name: "INPUT1"
37+
data_type: TYPE_UINT8
38+
dims: [ 1 ]
39+
}
40+
]
41+
output [
42+
{
43+
name: "OUTPUT"
44+
data_type: TYPE_FP32
45+
dims: [ 1 ]
46+
}
47+
]
48+
ensemble_scheduling {
49+
step [
50+
{
51+
# decoupled model
52+
model_name: "async_execute_decouple"
53+
model_version: 1
54+
input_map {
55+
key: "WAIT_SECONDS"
56+
value: "INPUT0"
57+
}
58+
input_map {
59+
key: "RESPONSE_NUM"
60+
value: "INPUT1"
61+
}
62+
output_map {
63+
key: "WAIT_SECONDS"
64+
value: "temp_output0"
65+
}
66+
output_map {
67+
key: "RESPONSE_NUM"
68+
value: "temp_output1"
69+
}
70+
},
71+
{
72+
# non-decoupled model
73+
model_name: "async_execute"
74+
model_version: 1
75+
input_map {
76+
key: "WAIT_SECONDS"
77+
value: "temp_output0"
78+
}
79+
input_map {
80+
key: "RESPONSE_NUM"
81+
value: "temp_output1"
82+
}
83+
output_map {
84+
key: "WAIT_SECONDS"
85+
value: "OUTPUT"
86+
}
87+
}
88+
]
89+
}

0 commit comments

Comments
 (0)