-
Notifications
You must be signed in to change notification settings - Fork 247
Open
Description
As an On-call Engineer or Developer, I want to run a comprehensive diagnostic CLI command that helps troubleshoot common issues and generate information for support tickets.
First and foremost, the CLI should gather relevant diagnostic information, like resource statuses, Fleet's k8s events and logs.
Additional, detailed diagnostics could check agent registration, resource dependencies, and communication problems, so that root causes can be identified.
Acceptance Criteria:
-
Ability to generate a diagnostic bundle (logs, resource YAMLs) for offline analysis and sharing with support.
- https://fleet.rancher.io/troubleshooting#where-to-look-for-root-causes-of-issues
- the bundle should snapshot fleet resources over a period of 30s (?)
- k8s events
- also dump the metrics
-
A
fleet troubleshoot
orfleet doctor
command acts as the main entry point to validate the current cluster.- Fetch metrics and analyze queue values (are queues healthy)?
- provide a summary of checks performed (pass/fail), highlights potential issues, and suggests corrective actions or further commands for deeper investigation (e.g., "try
kubectl logs -n cattle-fleet-system fleet-agent-<pod-id>
on cluster X").
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
🏗 In progress