Skip to content

Commit 4d35849

Browse files
committed
Auto merge of #709 - Mark-Simulacrum:add-metric, r=Mark-Simulacrum
Emit metrics for record_progress endpoint Previously we were only tracking the worker time, not the endpoint. We see that there is a direct correlation with the throughput of a job and the worker time. This seems wrong to me, because as long as the worker is keeping up with the input rate, the throughput shouldn't be affected. Note that we believe that the worker should not affect the HTTP endpoint at all - we connect these with a bounded queue and pushing into the queue is done with `try_send`, which shouldn't block (https://docs.rs/crossbeam-channel/latest/crossbeam_channel/struct.Sender.html#method.try_send) and returns an error if the queue is full. We already emit a metric if the queue is full, and that's not happening here. The hope is that the extra metric here gives us some clue for what the problem is. Metric graphs: <img width="1197" alt="image" src="https://github.com/rust-lang/crater/assets/5047365/34fac874-254f-4a91-b75a-4d2d9e25aea0"> <img width="1015" alt="image" src="https://github.com/rust-lang/crater/assets/5047365/d21ea685-56ef-49bf-943b-6dbe2648f336">
2 parents d4e2717 + bac1249 commit 4d35849

File tree

1 file changed

+13
-3
lines changed

1 file changed

+13
-3
lines changed

src/server/routes/agent.rs

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ use http::Response;
1212
use hyper::Body;
1313
use std::collections::HashMap;
1414
use std::sync::{Arc, Condvar, Mutex};
15+
use std::time::Instant;
1516
use warp::{self, Filter, Rejection};
1617

1718
#[derive(Deserialize)]
@@ -214,7 +215,7 @@ impl RecordProgressThread {
214215

215216
metrics
216217
.crater_endpoint_time
217-
.with_label_values(&["record_progress"])
218+
.with_label_values(&["record_progress_worker"])
218219
.observe(start.elapsed().as_secs_f64());
219220
}
220221
}));
@@ -298,14 +299,23 @@ fn endpoint_record_progress(
298299
data: Arc<Data>,
299300
_auth: AuthDetails,
300301
) -> Fallible<Response<Body>> {
301-
match data.record_progress_worker.queue.try_send(result) {
302+
let start = Instant::now();
303+
304+
let ret = match data.record_progress_worker.queue.try_send(result) {
302305
Ok(()) => Ok(ApiResponse::Success { result: true }.into_response()?),
303306
Err(crossbeam_channel::TrySendError::Full(_)) => {
304307
data.metrics.crater_bounced_record_progress.inc_by(1);
305308
Ok(ApiResponse::<()>::SlowDown.into_response()?)
306309
}
307310
Err(crossbeam_channel::TrySendError::Disconnected(_)) => unreachable!(),
308-
}
311+
};
312+
313+
data.metrics
314+
.crater_endpoint_time
315+
.with_label_values(&["record_progress_endpoint"])
316+
.observe(start.elapsed().as_secs_f64());
317+
318+
ret
309319
}
310320

311321
fn endpoint_heartbeat(data: Arc<Data>, auth: AuthDetails) -> Fallible<Response<Body>> {

0 commit comments

Comments
 (0)