-
Notifications
You must be signed in to change notification settings - Fork 12
Improve Logging and Error Handling #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When there are errors with patronictl and pgbackrest commands, write stderr to the support export log for troubleshooting.
The linter is failing for a part of the code not touched by this PR. I haven't seen this failure before, so I'm not sure why this check is failing now when it didn't fail before.
|
We pull in the latest version of the golang linter when we do our checks and the and version 1.62 added recvcheck. The error is because |
} | ||
} | ||
info, err := os.Stat(outputDir + "/" + outputFile) | ||
fmt.Print(exportSizeReport(float64(info.Size()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what causes this to be printed as the final message. It's been in the product for a while. A bit of a visual reminder for the user to know if the Support Export is too-big-to-email. There's actually an if
statement with the exportSizeReport
function to specifically tell the user what to do if the size is larger than 25MB.
Collecting PGO CLI logs...
┌────────────────────────────────────────────────────────────────
| Archive file size: 0.11 MiB
| Email the support export archive to support@crunchydata.com
| or attach as a email reply to your existing Support Ticket
└────────────────────────────────────────────────────────────────
internal/cmd/export.go
Outdated
@@ -1493,7 +1523,7 @@ func gatherPatroniInfo(ctx context.Context, | |||
writeInfo(cmd, err.Error()) | |||
return nil | |||
} | |||
return err | |||
writeInfo(cmd, strings.TrimSpace(fmt.Sprintf("Error with patronictl list: %s", stderr))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to print stderr and drop err?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so - the underlying issue is that when/if an error happens, the Support Export stops collecting any more data.
So in this specific case, if there is an issue running patronictl list
, no further information is gathered. That is not great when the subsequent data points are necessary. I think we should print the patroni error message (patroni could report helpful info on why the command failed) and continue on. In mind, it's not appropriate to return err
because the function can still recover at this error. But I'm open to discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, I see the need to not quit at every error. I just wonder if stderr
and err
might both have different and useful info -- should we print both? I don't remember (off the top of my head) what sort of errs and stderrs we get from these execs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my investigation, stderr
tends to have more app-specific info. My case was failing pgbackrest
commands (which is what prompted this original issue). err
tended to only report the error number, so it required detective work to determine what subsystem was even associated with the error.
I've got some ideas to combine the output from err
and stderr
in a meaningful way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how things look when I combine the patroni and pgbackrest sections with something like writeInfo(cmd, strings.TrimSpace(fmt.Sprintf("Error with patronictl list: %s: %s", err, strings.TrimSpace(stderr))))
stderr
needs to have its whitespace trimmed and putting err
first makes things look better when reading the output.
Collecting Patroni info...
Error with patronictl list: command terminated with exit code 2: Usage: patronictl [OPTIONS] COMMAND [ARGS]...
Try 'patronictl --help' for help.
Error: No such command 'foo'.
Error with patronictl history: command terminated with exit code 2: Usage: patronictl [OPTIONS] COMMAND [ARGS]...
Try 'patronictl --help' for help.
Error: No such command 'foo'.
Collecting pgBackRest info...
Error with pgbackrest info: command terminated with exit code 48: ERROR: [048]: invalid command 'foo'
Error with pgbackrest check: command terminated with exit code 48: ERROR: [048]: invalid command 'foo'
Collecting processes...
I can't do much to change how the subsystems report their errors, but the exact output is presented to the user for further investigation.
A) Agree with TJ here re: why this is being picked up now (e.g., floating linter version means we run most recent in GH actions; these methods mix pointer and value receivers) It's funny that this just came up now because recently Chris went through pgo for a similar issue, so I've been reviewing this section a bunch. The principles in that doc are (in order of importance)
In |
…ot Postgres nodes.
`stderr` contains the error message from the subsystem (pgbackrest or patroni) which is critical for the user to have for troubleshooting. Giving both provides consistency with error reporting in other parts of the support export.
This PR refactors logic to continue gathering data if encountering errors with support export.
This PR also improves how error messages are reported from Patroni and pgBackRest. The error messages are written to the console and the support export log for troubleshooting.