-
Notifications
You must be signed in to change notification settings - Fork 6
Add debug latency logs for all-purpose compute #862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add debug latency logs for all-purpose compute #862
Conversation
Please ensure that the NEXT_CHANGELOG.md file is updated with any relevant changes. |
📊 Code Coverage Report
|
@@ -312,6 +401,20 @@ DatabricksResultSet executeAsync( | |||
throw new DatabricksSQLException(response.status.errorMessage, response.status.sqlState); | |||
} | |||
} catch (DatabricksSQLException | TException e) { | |||
long executeAsyncEndTime = System.currentTimeMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each of this is laced with DatabricksMetricsTimedProcessor
- can we maybe use that to log the timings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i would have ideally used that. But that annotation accesses connection-context, statement-id, etc. through context holder which is broken. Many of these logs need that info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context holder which is broken
The aim is just to print the time required on these functions, not to exportLatencyLog(executionTime);
maybe we just log the time and function name there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think statement id and session id, etc. are important for further analysis. for example, relating to the cluster logs, etc. just the method name will give us client side timings and we will have no way to relate to cluster and detailed analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complete logs that will be shared by the customer will anyway have the initial connection log, context etc. Is that not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it won't have statement ids, session ids for concurrent connections. And even if it has and we don't attach ids with latency logs, in concurrent execution, there is no guaranteed way to map a latency log of a method name with the corresponding session-id, statement-id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connectionContext bug is a one off case, we can trade off that with polluting our implementation I guess. If we are not ok with that, can you create a TODO to remove all of this once the task has been taken up? https://databricks.atlassian.net/browse/PECOBLR-389
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO to remove these once timed processor annotation has access to session, operation handles (or the latency issue is resolved).
NO_CHANGELOG=true
Description
Until telemetry is rolled-out, add debug latency logs to analyse the thrift server http calls
Lower level http transport logs
Thrift Accessor logs
Testing
Manual testing
Additional Notes to the Reviewer