Skip to content

Commit ba03080

Browse files
committed
Auto merge of #570 - Mark-Simulacrum:fix-stall, r=pietroalbini
Fix stall on failure to mark as failed The root cause of this bug hasn't yet been tracked down, but the additional logging added in the first and second commits should help track it down in the future as well. It seems pretty likely that there are more cases where bailing early leads to problems (as deletion from the graph is extremely necessary but seems easily missed), but hopefully the added logging will help reduce debugging time in the future. A more exhaustive fix here seems difficult with the current abstractions in the code, so would need more thought - my hope is that the bugs we may have can be squashed out pretty quickly with how many runs we have regularly through crater, and ultimately the rewrite avoided (as it has its own likelihood of adding bugs).
2 parents e8f8ace + 580db9b commit ba03080

File tree

2 files changed

+23
-8
lines changed

2 files changed

+23
-8
lines changed

src/runner/graph.rs

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -230,16 +230,24 @@ impl TasksGraph {
230230
self.mark_as_failed(child, ex, db, state, config, error, result, worker)?;
231231
}
232232

233-
match self.graph[node] {
233+
// We need to mark_as_completed the node here (if it's a task),
234+
// otherwise we'll later get stuck as the node is still considered
235+
// running (but has actually failed).
236+
let res = match self.graph[node] {
234237
Node::Task { ref task, .. } => {
235238
log::debug!("marking task {:?} as failed", task);
236-
task.mark_as_failed(ex, db, state, config, error, result)?
239+
let res = task.mark_as_failed(ex, db, state, config, error, result);
240+
if let Err(err) = &res {
241+
log::debug!("marking task {:?} as failed, failed: {:?}", task, err);
242+
}
243+
res
237244
}
238245
Node::CrateCompleted | Node::Root => return Ok(()),
239-
}
246+
};
240247

241248
self.mark_as_completed(node);
242-
Ok(())
249+
250+
res
243251
}
244252

245253
pub(super) fn pending_crates_count(&self) -> usize {

src/runner/mod.rs

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -101,10 +101,17 @@ pub fn run_ex<DB: WriteResults + Sync>(
101101
let mut threads = Vec::new();
102102

103103
for worker in &workers {
104-
let join = scope
105-
.builder()
106-
.name(worker.name().into())
107-
.spawn(move || worker.run())?;
104+
let join =
105+
scope
106+
.builder()
107+
.name(worker.name().into())
108+
.spawn(move || match worker.run() {
109+
Ok(()) => Ok(()),
110+
Err(r) => {
111+
log::warn!("worker {} failed: {:?}", worker.name(), r);
112+
Err(r)
113+
}
114+
})?;
108115
threads.push(join);
109116
}
110117
let disk_watcher_thread =

0 commit comments

Comments
 (0)