Skip to content

Commit c6f4a27

Browse files
authored
Connection errors to unauthenticated telemetry endpoint (#619)
* send telemetry to unauth endpoint in case of connection/auth errors Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * formatting Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added unit test for send_connection_error_telemetry Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * retry errors Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * Add functionality for export of latency logs via telemetry (#608) * added functionality for export of failure logs Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * changed logger.error to logger.debug in exc.py Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * Fix telemetry loss during Python shutdown Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * unit tests for export_failure_log Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * try-catch blocks to make telemetry failures non-blocking for connector operations Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed redundant try/catch blocks, added try/catch block to initialize and get telemetry client Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * skip null fields in telemetry request Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed dup import, renamed func, changed a filter_null_values to lamda Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed unnecassary class variable and a redundant try/except block Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * public functions defined at interface level Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * changed export_event and flush to private functions Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * formatting Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * changed connection_uuid to thread local in thrift backend Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * made errors more specific Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * revert change to connection_uuid Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * reverting change in close in telemetry client Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * JsonSerializableMixin Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * isdataclass check in JsonSerializableMixin Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * convert TelemetryClientFactory to module-level functions, replace NoopTelemetryClient class with NOOP_TELEMETRY_CLIENT singleton, updated tests accordingly Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * renamed connection_uuid as session_id_hex Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added NotImplementedError to abstract class, added unit tests Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * formatting Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added PEP-249 link, changed NoopTelemetryClient implementation Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed unused import Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * made telemetry client close a module-level function Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * unit tests verbose Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * debug logs in unit tests Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * debug logs in unit tests Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed ABC from mixin, added try/catch block around executor shutdown Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * checking stuff Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * finding out * finding out more * more more finding out more nice * locks are useless anyways * haha * normal * := looks like walrus horizontally * one more * walrus again * old stuff without walrus seems to fail * manually do the walrussing * change 3.13t, v2 Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * formatting, added walrus Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * formatting Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed walrus, removed test before stalling test Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * changed order of stalling test Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed debugging, added TelemetryClientFactory Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * remove more debugging Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * latency logs funcitionality Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * fixed type of return value in get_session_id_hex() in thrift backend Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * debug on TelemetryClientFactory lock Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * formatting Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * type notation for _waiters Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * called connection.close() in test_arraysize_buffer_size_passthrough Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * run all unit tests Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * more debugging Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed the connection.close() from that test, put debug statement before and after TelemetryClientFactory lock Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * more debug Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * more more more Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * why Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * whywhy Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * thread name Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added teardown to all tests except finalizer test (gc collect) Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added the get_attribute functions to the classes Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed tearDown, added connection.close() to first test Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * finally Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * remove debugging Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added test for export_latency_log, made mock of thrift backend with retry policy Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added multi threaded tests Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * formatting Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added TelemetryExtractor, removed multithreaded tests Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * formatting Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * fixes in test Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * fix in telemetry extractor Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added doc strings to latency_logger, abstracted export_telemetry_log Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * statement type, unit test fix Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * unit test fix Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * statement type changes Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * test_fetches fix Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added mocks to resolve the errors caused by log_latency decorator in tests Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed function in test_fetches cuz it is only used once Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * added _safe_call which returns None in case of errors in the get functions Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed the changes in test_client and test_fetches Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed the changes in test_fetches Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * test_telemetry Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed test Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> --------- Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * MaxRetryDurationError Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * main changes Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * formatting Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * import json Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * without the max retry errors Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * unauth telemetry client Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * remove duplicate code setting telemetry_enabled Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * removed unused errors Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * merge with main changes Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * test Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * without try/catch block Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * - Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * error log for auth provider, ThriftDatabricksClient Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * error log for session.open Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * retry tests fix Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * test connection failure log Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * check types fix Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * test Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> * rephrase import Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com> --------- Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
1 parent e0ca049 commit c6f4a27

File tree

6 files changed

+103
-20
lines changed

6 files changed

+103
-20
lines changed

src/databricks/sql/client.py

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
NotSupportedError,
2323
ProgrammingError,
2424
)
25+
2526
from databricks.sql.thrift_api.TCLIService import ttypes
2627
from databricks.sql.backend.thrift_backend import ThriftDatabricksClient
2728
from databricks.sql.backend.databricks_client import DatabricksClient
@@ -251,17 +252,30 @@ def read(self) -> Optional[OAuthToken]:
251252
self.client_telemetry_enabled and self.server_telemetry_enabled
252253
)
253254

254-
self.session = Session(
255-
server_hostname,
256-
http_path,
257-
http_headers,
258-
session_configuration,
259-
catalog,
260-
schema,
261-
_use_arrow_native_complex_types,
262-
**kwargs,
263-
)
264-
self.session.open()
255+
try:
256+
self.session = Session(
257+
server_hostname,
258+
http_path,
259+
http_headers,
260+
session_configuration,
261+
catalog,
262+
schema,
263+
_use_arrow_native_complex_types,
264+
**kwargs,
265+
)
266+
self.session.open()
267+
except Exception as e:
268+
TelemetryClientFactory.connection_failure_log(
269+
error_name="Exception",
270+
error_message=str(e),
271+
host_url=server_hostname,
272+
http_path=http_path,
273+
port=kwargs.get("_port", 443),
274+
user_agent=self.session.useragent_header
275+
if hasattr(self, "session")
276+
else None,
277+
)
278+
raise e
265279

266280
self.use_inline_params = self._set_use_inline_params_with_warning(
267281
kwargs.get("use_inline_params", False)

src/databricks/sql/session.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ def __init__(
3939
self.session_configuration = session_configuration
4040
self.catalog = catalog
4141
self.schema = schema
42+
self.http_path = http_path
4243

4344
self.auth_provider = get_python_sql_connector_auth_provider(
4445
server_hostname, **kwargs
@@ -93,6 +94,7 @@ def open(self):
9394
catalog=self.catalog,
9495
schema=self.schema,
9596
)
97+
9698
self.protocol_version = self.get_protocol_version(self._session_id)
9799
self.is_open = True
98100
logger.info("Successfully opened session %s", str(self.guid_hex))

src/databricks/sql/telemetry/models/event.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,9 +149,9 @@ class TelemetryEvent(JsonSerializableMixin):
149149
operation_latency_ms (Optional[int]): Operation latency in milliseconds
150150
"""
151151

152-
session_id: str
153152
system_configuration: DriverSystemConfiguration
154153
driver_connection_params: DriverConnectionParameters
154+
session_id: Optional[str] = None
155155
sql_statement_id: Optional[str] = None
156156
auth_type: Optional[str] = None
157157
vol_operation: Optional[DriverVolumeOperation] = None

src/databricks/sql/telemetry/telemetry_client.py

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,20 @@
88
TelemetryEvent,
99
DriverSystemConfiguration,
1010
DriverErrorInfo,
11+
DriverConnectionParameters,
12+
HostDetails,
1113
)
1214
from databricks.sql.telemetry.models.frontend_logs import (
1315
TelemetryFrontendLog,
1416
TelemetryClientContext,
1517
FrontendLogContext,
1618
FrontendLogEntry,
1719
)
18-
from databricks.sql.telemetry.models.enums import AuthMech, AuthFlow
20+
from databricks.sql.telemetry.models.enums import (
21+
AuthMech,
22+
AuthFlow,
23+
DatabricksClientType,
24+
)
1925
from databricks.sql.telemetry.models.endpoint_models import (
2026
TelemetryRequest,
2127
TelemetryResponse,
@@ -431,3 +437,35 @@ def close(session_id_hex):
431437
logger.debug("Failed to shutdown thread pool executor: %s", e)
432438
TelemetryClientFactory._executor = None
433439
TelemetryClientFactory._initialized = False
440+
441+
@staticmethod
442+
def connection_failure_log(
443+
error_name: str,
444+
error_message: str,
445+
host_url: str,
446+
http_path: str,
447+
port: int,
448+
user_agent: Optional[str] = None,
449+
):
450+
"""Send error telemetry when connection creation fails, without requiring a session"""
451+
452+
UNAUTH_DUMMY_SESSION_ID = "unauth_session_id"
453+
454+
TelemetryClientFactory.initialize_telemetry_client(
455+
telemetry_enabled=True,
456+
session_id_hex=UNAUTH_DUMMY_SESSION_ID,
457+
auth_provider=None,
458+
host_url=host_url,
459+
)
460+
461+
telemetry_client = TelemetryClientFactory.get_telemetry_client(
462+
UNAUTH_DUMMY_SESSION_ID
463+
)
464+
telemetry_client._driver_connection_params = DriverConnectionParameters(
465+
http_path=http_path,
466+
mode=DatabricksClientType.THRIFT, # TODO: Add SEA mode
467+
host_info=HostDetails(host_url=host_url, port=port),
468+
)
469+
telemetry_client._user_agent = user_agent
470+
471+
telemetry_client.export_failure_log(error_name, error_message)

tests/e2e/common/retry_test_mixins.py

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,8 @@ class PySQLRetryTestsMixin:
127127
"_retry_delay_default": 0.5,
128128
}
129129

130-
def test_retry_urllib3_settings_are_honored(self):
130+
@patch("databricks.sql.telemetry.telemetry_client.TelemetryClient._send_telemetry")
131+
def test_retry_urllib3_settings_are_honored(self, mock_send_telemetry):
131132
"""Databricks overrides some of urllib3's configuration. This tests confirms that what configuration
132133
we DON'T override is preserved in urllib3's internals
133134
"""
@@ -147,7 +148,8 @@ def test_retry_urllib3_settings_are_honored(self):
147148
assert rp.read == 11
148149
assert rp.redirect == 12
149150

150-
def test_oserror_retries(self):
151+
@patch("databricks.sql.telemetry.telemetry_client.TelemetryClient._send_telemetry")
152+
def test_oserror_retries(self, mock_send_telemetry):
151153
"""If a network error occurs during make_request, the request is retried according to policy"""
152154
with patch(
153155
"urllib3.connectionpool.HTTPSConnectionPool._validate_conn",
@@ -159,7 +161,8 @@ def test_oserror_retries(self):
159161

160162
assert mock_validate_conn.call_count == 6
161163

162-
def test_retry_max_count_not_exceeded(self):
164+
@patch("databricks.sql.telemetry.telemetry_client.TelemetryClient._send_telemetry")
165+
def test_retry_max_count_not_exceeded(self, mock_send_telemetry):
163166
"""GIVEN the max_attempts_count is 5
164167
WHEN the server sends nothing but 429 responses
165168
THEN the connector issues six request (original plus five retries)
@@ -171,7 +174,8 @@ def test_retry_max_count_not_exceeded(self):
171174
pass
172175
assert mock_obj.return_value.getresponse.call_count == 6
173176

174-
def test_retry_exponential_backoff(self):
177+
@patch("databricks.sql.telemetry.telemetry_client.TelemetryClient._send_telemetry")
178+
def test_retry_exponential_backoff(self, mock_send_telemetry):
175179
"""GIVEN the retry policy is configured for reasonable exponential backoff
176180
WHEN the server sends nothing but 429 responses with retry-afters
177181
THEN the connector will use those retry-afters values as floor
@@ -338,7 +342,8 @@ def test_retry_abort_close_operation_on_404(self, caplog):
338342
"Operation was canceled by a prior request" in caplog.text
339343
)
340344

341-
def test_retry_max_redirects_raises_too_many_redirects_exception(self):
345+
@patch("databricks.sql.telemetry.telemetry_client.TelemetryClient._send_telemetry")
346+
def test_retry_max_redirects_raises_too_many_redirects_exception(self, mock_send_telemetry):
342347
"""GIVEN the connector is configured with a custom max_redirects
343348
WHEN the DatabricksRetryPolicy is created
344349
THEN the connector raises a MaxRedirectsError if that number is exceeded
@@ -362,7 +367,8 @@ def test_retry_max_redirects_raises_too_many_redirects_exception(self):
362367
# Total call count should be 2 (original + 1 retry)
363368
assert mock_obj.return_value.getresponse.call_count == expected_call_count
364369

365-
def test_retry_max_redirects_unset_doesnt_redirect_forever(self):
370+
@patch("databricks.sql.telemetry.telemetry_client.TelemetryClient._send_telemetry")
371+
def test_retry_max_redirects_unset_doesnt_redirect_forever(self, mock_send_telemetry):
366372
"""GIVEN the connector is configured without a custom max_redirects
367373
WHEN the DatabricksRetryPolicy is used
368374
THEN the connector raises a MaxRedirectsError if that number is exceeded

tests/unit/test_telemetry.py

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
NoopTelemetryClient,
99
TelemetryClientFactory,
1010
TelemetryHelper,
11-
BaseTelemetryClient,
1211
)
1312
from databricks.sql.telemetry.models.enums import AuthMech, AuthFlow
1413
from databricks.sql.auth.authenticators import (
@@ -290,3 +289,27 @@ def test_factory_shutdown_flow(self):
290289
TelemetryClientFactory.close(session2)
291290
assert TelemetryClientFactory._initialized is False
292291
assert TelemetryClientFactory._executor is None
292+
293+
@patch("databricks.sql.telemetry.telemetry_client.TelemetryClient.export_failure_log")
294+
@patch("databricks.sql.client.Session")
295+
def test_connection_failure_sends_correct_telemetry_payload(
296+
self, mock_session, mock_export_failure_log
297+
):
298+
"""
299+
Verify that a connection failure constructs and sends the correct
300+
telemetry payload via _send_telemetry.
301+
"""
302+
303+
error_message = "Could not connect to host"
304+
mock_session.side_effect = Exception(error_message)
305+
306+
try:
307+
from databricks import sql
308+
sql.connect(server_hostname="test-host", http_path="/test-path")
309+
except Exception as e:
310+
assert str(e) == error_message
311+
312+
mock_export_failure_log.assert_called_once()
313+
call_arguments = mock_export_failure_log.call_args
314+
assert call_arguments[0][0] == "Exception"
315+
assert call_arguments[0][1] == error_message

0 commit comments

Comments
 (0)