/metrics only start returning contents after container's FIRST inference was called? #2570
Unanswered
cringelord000222
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
I am using docker compose for my TGI and I've tried with 3 different versions,
2.0.4
,2.2.0
&2.3.0
.I've used chrome & postman to call the metrics endpoint and apparently they will return full blank, not even metrics with value zeros.
The contents for /metrics will start showing
only after calling /generate
once.So my pipeline handles incoming request by querying metrics first (in particularly
tgi_batch_current_size
andtgi_queue_size
to check the queue), then only sends requests, meaning the first incoming request would get an error because metrics return blank.Right now I have to include a hidden "first inference call" in my deployment script, to trigger metrics to return something (I don't mind if they return zeros).
Am I doing things wrong?
Suggestion:
Can we publish all the metrics with value 0 once, when TGI server has initialized? Instead of publishing after first inference call was made.
Beta Was this translation helpful? Give feedback.
All reactions