Open
Description
I made a new bundle on rack3 (the first time ever in this environment) and its state didn't not progress beyond collecting
after many hours.
oxide --profile colo bundle list
[
{
"id": "c6b507df-cb67-47c4-8887-ba9fc0fc0034",
"reason_for_creation": "Created by external API",
"state": "collecting",
"time_created": "2025-07-09T01:01:12.246530Z"
}
]
I looked up the bundle's dataset info from the database
root@[fd00:1122:3344:116::3]:32221/omicron> select * from support_bundle;
id | time_created | reason_for_creation | reason_for_failure | state | zpool_id | dataset_id | assigned_nexus
---------------------------------------+------------------------------+-------------------------+--------------------+------------+--------------------------------------+--------------------------------------+---------------------------------------
c6b507df-cb67-47c4-8887-ba9fc0fc0034 | 2025-07-09 01:01:12.24653+00 | Created by external API | NULL | collecting | de682b18-afaf-4d53-b62e-934f6bd4a1f8 | 003d27e0-57e4-4d55-963e-af47e4e526f1 | 95ebe94d-0e68-421d-9260-c30bd7fe4bd6
(1 row)
The dataset that was supposed to receive the bundle remained empty:
BRM42220015 # ls -l /pool/ext/de682b18-afaf-4d53-b62e-934f6bd4a1f8/crypt/debug/c6b507df-cb67-47c4-8887-ba9fc0fc0034/
total 0
The assigned nexus log showed that the collector background task was doing the work:
angela@castle /staff/angela $ grep c6b507df oxide-nexus.log.1752023701 | head -30 | looker
01:01:18.330Z INFO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): SupportBundleCollector: Found bundle to collect
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
bundles_in_queue = 1
file = nexus/src/app/background/tasks/support_bundle_collector.rs:364
01:01:18.330Z INFO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): Collecting bundle as local file
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
file = nexus/src/app/background/tasks/support_bundle_collector.rs:562
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/local/all-sp-ids
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/local/all-sp-ids", status: 200, headers: {"content-type": "application/json", "x-request-id": "46a2fcee-08ee-49a9-8f72-4329dd215192", "content-length": "929", "date": "Wed, 09 Jul 2025 01:01:17 GMT"} })
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/7/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/24/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/switch/0/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/28/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/20/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/switch/1/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/29/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/22/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/0/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/5/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/4/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/6/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/9/task-dump
01:01:18.367Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/power/0/task-dump
01:01:18.368Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/31/task-dump
01:01:18.368Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/13/task-dump
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/7/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "bf0a7286-96f8-41aa-9ee7-3dc58ee4f194", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/power/0/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "b8704e84-5aeb-43e8-8772-63ecfec50588", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/3/task-dump
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client request
background_task = support_bundle_collector
body = None
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
method = GET
uri = http://[fd00:1122:3344:11f::2]:12225/sp/sled/2/task-dump
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/28/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "4c05f0a6-8541-40c1-b34d-0366f22fa767", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/5/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "838690ad-3c40-4bd7-90a8-5f416ecc19ad", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/6/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "3d38a033-ce0f-4587-b977-a16ca9e86162", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/29/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "1ab05012-1d88-4098-92b4-6132508078e9", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/sled/0/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "78360e15-080d-4f1a-abeb-a847c66cd10c", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
01:01:18.373Z DEBG 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): client response
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
result = Ok(Response { url: "http://[fd00:1122:3344:11f::2]:12225/sp/switch/0/task-dump", status: 200, headers: {"content-type": "application/json", "x-request-id": "c92d82c0-63c2-4500-be40-98fcd18a0016", "content-length": "1", "date": "Wed, 09 Jul 2025 01:01:18 GMT"} })
In between, the collector hit some errors:
01:01:33.380Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP power 1: failed to get task dump count from SP: Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "424eb413-81c0-4c33-82c0-0a029020114d", "content-length": "198", "date": "Wed, 09 Jul 2025 01:01:20 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Power, slot: 1 }: no SP discovered", request_id: "424eb413-81c0-4c33-82c0-0a029020114d" }
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.380Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 24: failed to get task dump count from SP: Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "be60285f-baf5-4b8c-b419-6ac16e4efc58", "content-length": "198", "date": "Wed, 09 Jul 2025 01:01:20 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Sled, slot: 24 }: no SP discovered", request_id: "be60285f-baf5-4b8c-b419-6ac16e4efc58" }
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.381Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 9: failed to get task dump count from SP: Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "d6f417ea-c5ec-419d-bedf-e006360a9ebf", "content-length": "197", "date": "Wed, 09 Jul 2025 01:01:20 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Sled, slot: 9 }: no SP discovered", request_id: "d6f417ea-c5ec-419d-bedf-e006360a9ebf" }
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.381Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 22: failed to get task dump count from SP: Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "0b44e793-3d95-4774-bbf4-f2628d3cae40", "content-length": "224", "date": "Wed, 09 Jul 2025 01:01:31 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Sled, slot: 22 }: RPC call failed (gave up after 5 attempts)", request_id: "0b44e793-3d95-4774-bbf4-f2628d3cae40" }
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.381Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 4: failed to get task dump count from SP: Communication Error: error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/4/task-dump): error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/4/task-dump): operation timed out
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.381Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 18: failed to get task dump count from SP: Communication Error: error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/18/task-dump): error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/18/task-dump): operation timed out
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
01:01:33.381Z ERRO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): failed to capture task dumps
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
error = SP sled 30: failed to get task dump count from SP: Communication Error: error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/30/task-dump): error sending request for url (http://[fd00:1122:3344:11f::2]:12225/sp/sled/30/task-dump): operation timed out
file = nexus/src/app/background/tasks/support_bundle_collector.rs:1031
And it was supposedly completed after 6+ mins:
01:07:36.752Z INFO 95ebe94d-0e68-421d-9260-c30bd7fe4bd6 (ServerContext): Bundle Collection completed
background_task = support_bundle_collector
bundle = c6b507df-cb67-47c4-8887-ba9fc0fc0034
file = nexus/src/app/background/tasks/support_bundle_collector.rs:485
It's unclear if the errors caused the bundle to not be persisted or there were some other errors it hit that contributed to the stuck status.