Skip to content

Added classes required for telemetry #572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
May 30, 2025
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
4c122b1
PECOBLR-86 Improve logging for debug level
saishreeeee May 15, 2025
23374be
PECOBLR-86 Improve logging for debug level
saishreeeee May 15, 2025
3971d3a
fixed format
saishreeeee May 15, 2025
91e3f40
used lazy logging
saishreeeee May 15, 2025
6331fc1
changed debug to error logs
saishreeeee May 15, 2025
63593a6
added classes required for telemetry
saishreeeee May 22, 2025
bb69dc9
removed TelemetryHelper
saishreeeee May 22, 2025
4cb0d70
[PECOBLR-361] convert column table to arrow if arrow present (#551)
shivam2680 May 16, 2025
d1efa03
Update CODEOWNERS (#562)
jprakash-db May 21, 2025
2fc3cb6
Enhance Cursor close handling and context manager exception managemen…
madhav-db May 21, 2025
73471e9
PECOBLR-86 improve logging on python driver (#556)
saishreeeee May 22, 2025
74c6463
Update github actions run conditions (#569)
jprakash-db May 26, 2025
cbc9ebf
Added classes required for telemetry
saishreeeee May 26, 2025
9d10e16
fixed example
saishreeeee May 26, 2025
1c467f3
Merge branch 'origin/telemetry' into PECOBLR-441
saishreeeee May 26, 2025
6302327
changed to doc string
saishreeeee May 26, 2025
e16fce5
removed self.telemetry close line
saishreeeee May 26, 2025
7461d96
grouped classes
saishreeeee May 27, 2025
95e43e4
formatting
saishreeeee May 27, 2025
d72fb27
fixed doc string
saishreeeee May 27, 2025
28efaba
fixed doc string
saishreeeee May 27, 2025
c8c08dd
added more descriptive comments, put dataclasses in a sub-folder
saishreeeee May 28, 2025
74ea9b6
fixed default attributes ordering
saishreeeee May 28, 2025
ac7881f
changed file names
saishreeeee May 28, 2025
bff17b5
Merge remote-tracking branch 'origin/telemetry' into PECOBLR-441
saishreeeee May 29, 2025
6219d38
added enums to models folder
saishreeeee May 29, 2025
6305323
removed telemetry batch size
saishreeeee May 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion src/databricks/sql/client.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import time
from typing import Dict, Tuple, List, Optional, Any, Union, Sequence

import pandas

try:
Expand Down Expand Up @@ -234,6 +233,13 @@ def read(self) -> Optional[OAuthToken]:
server_hostname, **kwargs
)

self.server_telemetry_enabled = True
self.client_telemetry_enabled = kwargs.get("enable_telemetry", False)
self.telemetry_enabled = (
self.client_telemetry_enabled and self.server_telemetry_enabled
)
telemetry_batch_size = kwargs.get("telemetry_batch_size", 200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some basis for this 200 value ? cc @vikrantpuppala

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default telemetry batch size is 200 in the JDBC driver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fetch the hardcoded variable instead of hardcoding it ourselves?

Consider this scenario : Developer changes the default value, but forgets to change line 241 - this will incorrectly populate the telemetry logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default telemetry batch size is 200 in the JDBC driver

Also, why should we indicate the JDBC driver values here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, the populating of telemetry fields should be abstracted out of client.py


user_agent_entry = kwargs.get("user_agent_entry")
if user_agent_entry is None:
user_agent_entry = kwargs.get("_user_agent_entry")
Expand Down Expand Up @@ -425,6 +431,9 @@ def _close(self, close_cursors=True) -> None:

self.open = False

if hasattr(self, "telemetry_client"):
self.telemetry_client.close()

def commit(self):
"""No-op because Databricks does not support transactions"""
pass
Expand Down
41 changes: 41 additions & 0 deletions src/databricks/sql/telemetry/DriverConnectionParameters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import json
from dataclasses import dataclass, asdict
from databricks.sql.telemetry.HostDetails import HostDetails
from databricks.sql.telemetry.enums.AuthMech import AuthMech
from databricks.sql.telemetry.enums.AuthFlow import AuthFlow
from databricks.sql.telemetry.enums.DatabricksClientType import DatabricksClientType


@dataclass
class DriverConnectionParameters:
http_path: str
mode: DatabricksClientType
host_info: HostDetails
auth_mech: AuthMech
auth_flow: AuthFlow
auth_scope: str
discovery_url: str
allowed_volume_ingestion_paths: str
azure_tenant_id: str
socket_timeout: int

def to_json(self):
return json.dumps(asdict(self))


# Part of TelemetryEvent
# DriverConnectionParameters connectionParams = new DriverConnectionParameters(
# httpPath = " /sql/1.0/endpoints/1234567890abcdef",
# driverMode = "THRIFT",
# hostDetails = new HostDetails(
# hostUrl = "https://my-workspace.cloud.databricks.com",
# port = 443
# ),
# authMech = "OAUTH",
# authFlow = "AZURE_MANAGED_IDENTITIES",
# authScope = "sql",
# discoveryUrl = "https://example-url",
# allowedVolumeIngestionPaths = "[]",
# azureTenantId = "1234567890abcdef",
# socketTimeout = 10000
# )
23 changes: 23 additions & 0 deletions src/databricks/sql/telemetry/DriverErrorInfo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import json
from dataclasses import dataclass, asdict


@dataclass
class DriverErrorInfo:
error_name: str
stack_trace: str

def to_json(self):
return json.dumps(asdict(self))


# Required for ErrorLogs
# DriverErrorInfo errorInfo = new DriverErrorInfo(
# errorName="CONNECTION_ERROR",
# stackTrace="Connection failure while using the Databricks SQL Python connector. Failed to connect to server: https://my-workspace.cloud.databricks.com\n" +
# "databricks.sql.exc.OperationalError: Connection refused: connect\n" +
# "at databricks.sql.thrift_backend.ThriftBackend.make_request(ThriftBackend.py:329)\n" +
# "at databricks.sql.thrift_backend.ThriftBackend.attempt_request(ThriftBackend.py:366)\n" +
# "at databricks.sql.thrift_backend.ThriftBackend.open_session(ThriftBackend.py:575)\n" +
# "at databricks.sql.client.Connection.__init__(client.py:69)\n" +
# "at databricks.sql.client.connect(connection.py:123)")
37 changes: 37 additions & 0 deletions src/databricks/sql/telemetry/DriverSystemConfiguration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import json
from dataclasses import dataclass, asdict
from databricks.sql import __version__


@dataclass
class DriverSystemConfiguration:
driver_version: str
os_name: str
os_version: str
os_arch: str
runtime_name: str
runtime_version: str
runtime_vendor: str
client_app_name: str
locale_name: str
driver_name: str
char_set_encoding: str

def to_json(self):
return json.dumps(asdict(self))


# Part of TelemetryEvent
# DriverSystemConfiguration systemConfig = new DriverSystemConfiguration(
# driver_version = "2.9.3",
# os_name = "Darwin",
# os_version = "24.4.0",
# os_arch = "arm64",
# runtime_name = "CPython",
# runtime_version = "3.13.3",
# runtime_vendor = "cpython",
# client_app_name = "databricks-sql-python",
# locale_name = "en_US",
# driver_name = "databricks-sql-python",
# char_set_encoding = "UTF-8"
# )
21 changes: 21 additions & 0 deletions src/databricks/sql/telemetry/DriverVolumeOperation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import json
from dataclasses import dataclass, asdict
from databricks.sql.telemetry.enums.DriverVolumeOperationType import (
DriverVolumeOperationType,
)


@dataclass
class DriverVolumeOperation:
volume_operation_type: DriverVolumeOperationType
volume_path: str

def to_json(self):
return json.dumps(asdict(self))


# Part of TelemetryEvent
# DriverVolumeOperation volumeOperation = new DriverVolumeOperation(
# volumeOperationType = "LIST",
# volumePath = "/path/to/volume"
# )
20 changes: 20 additions & 0 deletions src/databricks/sql/telemetry/FrontendLogContext.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import json
from dataclasses import dataclass, asdict
from databricks.sql.telemetry.TelemetryClientContext import TelemetryClientContext


@dataclass
class FrontendLogContext:
client_context: TelemetryClientContext

def to_json(self):
return json.dumps(asdict(self))


# used in TelemetryFrontendLog
# FrontendLogContext frontendLogContext = new FrontendLogContext(
# clientContext = new TelemetryClientContext(
# timestampMillis = 1716489600000,
# userAgent = "databricks-sql-python-test"
# )
# )
11 changes: 11 additions & 0 deletions src/databricks/sql/telemetry/FrontendLogEntry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import json
from dataclasses import dataclass, asdict
from databricks.sql.telemetry.TelemetryEvent import TelemetryEvent


@dataclass
class FrontendLogEntry:
sql_driver_log: TelemetryEvent

def to_json(self):
return json.dumps(asdict(self))
18 changes: 18 additions & 0 deletions src/databricks/sql/telemetry/HostDetails.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import json
from dataclasses import dataclass, asdict


@dataclass
class HostDetails:
host_url: str
port: int

def to_json(self):
return json.dumps(asdict(self))


# Part of DriverConnectionParameters
# HostDetails hostDetails = new HostDetails(
# hostUrl = "https://my-workspace.cloud.databricks.com",
# port = 443
# )
24 changes: 24 additions & 0 deletions src/databricks/sql/telemetry/SqlExecutionEvent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import json
from dataclasses import dataclass, asdict
from databricks.sql.telemetry.enums.StatementType import StatementType
from databricks.sql.telemetry.enums.ExecutionResultFormat import ExecutionResultFormat


@dataclass
class SqlExecutionEvent:
statement_type: StatementType
is_compressed: bool
execution_result: ExecutionResultFormat
retry_count: int

def to_json(self):
return json.dumps(asdict(self))


# Part of TelemetryEvent
# SqlExecutionEvent sqlExecutionEvent = new SqlExecutionEvent(
# statementType = "QUERY",
# isCompressed = true,
# executionResult = "INLINE_ARROW",
# retryCount = 0
# )
18 changes: 18 additions & 0 deletions src/databricks/sql/telemetry/TelemetryClientContext.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
from dataclasses import dataclass, asdict
import json


@dataclass
class TelemetryClientContext:
timestamp_millis: int
user_agent: str

def to_json(self):
return json.dumps(asdict(self))


# used in FrontendLogContext
# TelemetryClientContext clientContext = new TelemetryClientContext(
# timestampMillis = 1716489600000,
# userAgent = "databricks-sql-python-test"
# )
25 changes: 25 additions & 0 deletions src/databricks/sql/telemetry/TelemetryEvent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import json
from dataclasses import dataclass, asdict
from databricks.sql.telemetry.DriverSystemConfiguration import DriverSystemConfiguration
from databricks.sql.telemetry.DriverConnectionParameters import (
DriverConnectionParameters,
)
from databricks.sql.telemetry.DriverVolumeOperation import DriverVolumeOperation
from databricks.sql.telemetry.SqlExecutionEvent import SqlExecutionEvent
from databricks.sql.telemetry.DriverErrorInfo import DriverErrorInfo


@dataclass
class TelemetryEvent:
session_id: str
sql_statement_id: str
system_configuration: DriverSystemConfiguration
driver_connection_params: DriverConnectionParameters
auth_type: str
vol_operation: DriverVolumeOperation
sql_operation: SqlExecutionEvent
error_info: DriverErrorInfo
operation_latency_ms: int

def to_json(self):
return json.dumps(asdict(self))
15 changes: 15 additions & 0 deletions src/databricks/sql/telemetry/TelemetryFrontendLog.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import json
from dataclasses import dataclass, asdict
from databricks.sql.telemetry.FrontendLogContext import FrontendLogContext
from databricks.sql.telemetry.FrontendLogEntry import FrontendLogEntry


@dataclass
class TelemetryFrontendLog:
workspace_id: int
frontend_log_event_id: str
context: FrontendLogContext
entry: FrontendLogEntry

def to_json(self):
return json.dumps(asdict(self))
13 changes: 13 additions & 0 deletions src/databricks/sql/telemetry/TelemetryRequest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import json
from dataclasses import dataclass, asdict
from typing import List, Optional


@dataclass
class TelemetryRequest:
uploadTime: int
items: List[str]
protoLogs: Optional[List[str]]

def to_json(self):
return json.dumps(asdict(self))
13 changes: 13 additions & 0 deletions src/databricks/sql/telemetry/TelemetryResponse.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import json
from dataclasses import dataclass, asdict
from typing import List, Optional


@dataclass
class TelemetryResponse:
errors: List[str]
numSuccess: int
numProtoSuccess: int

def to_json(self):
return json.dumps(asdict(self))
8 changes: 8 additions & 0 deletions src/databricks/sql/telemetry/enums/AuthFlow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from enum import Enum


class AuthFlow(Enum):
TOKEN_PASSTHROUGH = "token_passthrough"
CLIENT_CREDENTIALS = "client_credentials"
BROWSER_BASED_AUTHENTICATION = "browser_based_authentication"
AZURE_MANAGED_IDENTITIES = "azure_managed_identities"
7 changes: 7 additions & 0 deletions src/databricks/sql/telemetry/enums/AuthMech.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from enum import Enum


class AuthMech(Enum):
OTHER = "other"
PAT = "pat"
OAUTH = "oauth"
6 changes: 6 additions & 0 deletions src/databricks/sql/telemetry/enums/DatabricksClientType.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from enum import Enum


class DatabricksClientType(Enum):
SEA = "SEA"
THRIFT = "THRIFT"
10 changes: 10 additions & 0 deletions src/databricks/sql/telemetry/enums/DriverVolumeOperationType.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from enum import Enum


class DriverVolumeOperationType(Enum):
TYPE_UNSPECIFIED = "type_unspecified"
PUT = "put"
GET = "get"
DELETE = "delete"
LIST = "list"
QUERY = "query"
8 changes: 8 additions & 0 deletions src/databricks/sql/telemetry/enums/ExecutionResultFormat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from enum import Enum


class ExecutionResultFormat(Enum):
FORMAT_UNSPECIFIED = "format_unspecified"
INLINE_ARROW = "inline_arrow"
EXTERNAL_LINKS = "external_links"
COLUMNAR_INLINE = "columnar_inline"
9 changes: 9 additions & 0 deletions src/databricks/sql/telemetry/enums/StatementType.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from enum import Enum


class StatementType(Enum):
NONE = "none"
QUERY = "query"
SQL = "sql"
UPDATE = "update"
METADATA = "metadata"
Loading