Skip to content

Commit e380654

Browse files
SeaDatabricksClient: Add Metadata Commands (#593)
* [squash from exec-sea] bring over execution phase changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess test Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add docstring Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remvoe exec func in sea backend Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess files Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess models Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess sea backend tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * cleanup Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * re-introduce get_schema_desc Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove SeaResultSet Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * clean imports and attributes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * pass CommandId to ExecResp Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove changes in types Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add back essential types (ExecResponse, from_sea_state) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * fix fetch types Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * excess imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce diff by maintaining logs Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * fix int test types Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * [squashed from exec-sea] init execution func Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove irrelevant changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove ResultSetFilter functionality Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove more irrelevant changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove more irrelevant changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * even more irrelevant changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove sea response as init option Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * exec test example scripts Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * formatting (black) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * [squashed from sea-exec] merge sea stuffs Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess removed docstring Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess changes in backend Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove accidentally removed _get_schema_desc Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unnecessary init with sea_response tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * rmeove unnecessary changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * formatting (black) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * improved models and filters from cloudfetch-sea branch Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * filters stuff (align with JDBC) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * backend from cloudfetch-sea Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove filtering, metadata ops Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * raise NotImplementedErrror for metadata ops Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add metadata commands Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * formatting (black) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add metadata command unit tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * change to valid table name Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary changes covered by #588 Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * simplify test module Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * logging -> debug level Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * change table name in log Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary backend cahnges Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-needed GetChunksResponse Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-needed GetChunksResponse only relevant in Fetch phase Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce code duplication in response parsing Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce code duplication Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * more clear docstrings Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * introduce strongly typed ChunkInfo Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove is_volume_operation from response Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add is_volume_op and more ResultData fields Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add test scripts Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * Revert "Merge branch 'sea-migration' into exec-models-sea" This reverts commit 8bd12d8, reversing changes made to 030edf8. * Revert "Merge branch 'exec-models-sea' into exec-phase-sea" This reverts commit be1997e, reversing changes made to 37813ba. * change logging level Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove excess changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove _get_schema_bytes (for now) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * redundant comments Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove fetch phase methods Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce code repetititon + introduce gaps after multi line pydocs Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unused imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * move description extraction to helper func Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * formatting (black) Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add more unit tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * streamline unit tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * test getting the list of allowed configurations Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce diff Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * reduce diff Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * house constants in enums for readability and immutability Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * add note on hybrid disposition Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove redundant note on arrow_schema_bytes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove irrelevant changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary test changes Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove un-necessary changes in thrift backend tests Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unimplemented methods test Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove invalid import Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * better align queries with JDBC impl Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * line breaks after multi-line PRs Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unused imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * fix: introduce ExecuteResponse import Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unimplemented metadata methods test, un-necessary imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * introduce unit tests for metadata methods Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove verbosity in ResultSetFilter docstring Co-authored-by: jayant <167047871+jayantsing-db@users.noreply.github.com> * remove un-necessary info in ResultSetFilter docstring Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove explicit type checking, string literals around forward annotations Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * house SQL commands in constants Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove catalog requirement in get_tables Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * move filters.py to SEA utils Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * ensure SeaResultSet Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * prevent circular imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove unused imports Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove cast, throw error if not SeaResultSet Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * make SEA backend methods return SeaResultSet Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * use spec-aligned Exceptions in SEA backend Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> * remove defensive row type check Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> --------- Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com> Co-authored-by: jayant <167047871+jayantsing-db@users.noreply.github.com>
1 parent 59b4825 commit e380654

File tree

5 files changed

+708
-85
lines changed

5 files changed

+708
-85
lines changed

src/databricks/sql/backend/sea/backend.py

Lines changed: 130 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
from __future__ import annotations
2+
13
import logging
24
import time
35
import re
@@ -10,11 +12,12 @@
1012
ResultDisposition,
1113
ResultCompression,
1214
WaitTimeout,
15+
MetadataCommands,
1316
)
1417

1518
if TYPE_CHECKING:
1619
from databricks.sql.client import Cursor
17-
from databricks.sql.result_set import ResultSet
20+
from databricks.sql.result_set import SeaResultSet
1821

1922
from databricks.sql.backend.databricks_client import DatabricksClient
2023
from databricks.sql.backend.types import (
@@ -24,7 +27,7 @@
2427
BackendType,
2528
ExecuteResponse,
2629
)
27-
from databricks.sql.exc import DatabaseError, ServerOperationError
30+
from databricks.sql.exc import DatabaseError, ProgrammingError, ServerOperationError
2831
from databricks.sql.backend.sea.utils.http_client import SeaHttpClient
2932
from databricks.sql.types import SSLOptions
3033

@@ -169,7 +172,7 @@ def _extract_warehouse_id(self, http_path: str) -> str:
169172
f"Note: SEA only works for warehouses."
170173
)
171174
logger.error(error_message)
172-
raise ValueError(error_message)
175+
raise ProgrammingError(error_message)
173176

174177
@property
175178
def max_download_threads(self) -> int:
@@ -241,14 +244,14 @@ def close_session(self, session_id: SessionId) -> None:
241244
session_id: The session identifier returned by open_session()
242245
243246
Raises:
244-
ValueError: If the session ID is invalid
247+
ProgrammingError: If the session ID is invalid
245248
OperationalError: If there's an error closing the session
246249
"""
247250

248251
logger.debug("SeaDatabricksClient.close_session(session_id=%s)", session_id)
249252

250253
if session_id.backend_type != BackendType.SEA:
251-
raise ValueError("Not a valid SEA session ID")
254+
raise ProgrammingError("Not a valid SEA session ID")
252255
sea_session_id = session_id.to_sea_session_id()
253256

254257
request_data = DeleteSessionRequest(
@@ -400,12 +403,12 @@ def execute_command(
400403
max_rows: int,
401404
max_bytes: int,
402405
lz4_compression: bool,
403-
cursor: "Cursor",
406+
cursor: Cursor,
404407
use_cloud_fetch: bool,
405408
parameters: List[Dict[str, Any]],
406409
async_op: bool,
407410
enforce_embedded_schema_correctness: bool,
408-
) -> Union["ResultSet", None]:
411+
) -> Union[SeaResultSet, None]:
409412
"""
410413
Execute a SQL command using the SEA backend.
411414
@@ -426,7 +429,7 @@ def execute_command(
426429
"""
427430

428431
if session_id.backend_type != BackendType.SEA:
429-
raise ValueError("Not a valid SEA session ID")
432+
raise ProgrammingError("Not a valid SEA session ID")
430433

431434
sea_session_id = session_id.to_sea_session_id()
432435

@@ -501,11 +504,11 @@ def cancel_command(self, command_id: CommandId) -> None:
501504
command_id: Command identifier to cancel
502505
503506
Raises:
504-
ValueError: If the command ID is invalid
507+
ProgrammingError: If the command ID is invalid
505508
"""
506509

507510
if command_id.backend_type != BackendType.SEA:
508-
raise ValueError("Not a valid SEA command ID")
511+
raise ProgrammingError("Not a valid SEA command ID")
509512

510513
sea_statement_id = command_id.to_sea_statement_id()
511514

@@ -524,11 +527,11 @@ def close_command(self, command_id: CommandId) -> None:
524527
command_id: Command identifier to close
525528
526529
Raises:
527-
ValueError: If the command ID is invalid
530+
ProgrammingError: If the command ID is invalid
528531
"""
529532

530533
if command_id.backend_type != BackendType.SEA:
531-
raise ValueError("Not a valid SEA command ID")
534+
raise ProgrammingError("Not a valid SEA command ID")
532535

533536
sea_statement_id = command_id.to_sea_statement_id()
534537

@@ -550,7 +553,7 @@ def get_query_state(self, command_id: CommandId) -> CommandState:
550553
CommandState: The current state of the command
551554
552555
Raises:
553-
ValueError: If the command ID is invalid
556+
ProgrammingError: If the command ID is invalid
554557
"""
555558

556559
if command_id.backend_type != BackendType.SEA:
@@ -572,8 +575,8 @@ def get_query_state(self, command_id: CommandId) -> CommandState:
572575
def get_execution_result(
573576
self,
574577
command_id: CommandId,
575-
cursor: "Cursor",
576-
) -> "ResultSet":
578+
cursor: Cursor,
579+
) -> SeaResultSet:
577580
"""
578581
Get the result of a command execution.
579582
@@ -582,14 +585,14 @@ def get_execution_result(
582585
cursor: Cursor executing the command
583586
584587
Returns:
585-
ResultSet: A SeaResultSet instance with the execution results
588+
SeaResultSet: A SeaResultSet instance with the execution results
586589
587590
Raises:
588591
ValueError: If the command ID is invalid
589592
"""
590593

591594
if command_id.backend_type != BackendType.SEA:
592-
raise ValueError("Not a valid SEA command ID")
595+
raise ProgrammingError("Not a valid SEA command ID")
593596

594597
sea_statement_id = command_id.to_sea_statement_id()
595598

@@ -626,47 +629,141 @@ def get_catalogs(
626629
session_id: SessionId,
627630
max_rows: int,
628631
max_bytes: int,
629-
cursor: "Cursor",
630-
):
631-
"""Not implemented yet."""
632-
raise NotImplementedError("get_catalogs is not yet implemented for SEA backend")
632+
cursor: Cursor,
633+
) -> SeaResultSet:
634+
"""Get available catalogs by executing 'SHOW CATALOGS'."""
635+
result = self.execute_command(
636+
operation=MetadataCommands.SHOW_CATALOGS.value,
637+
session_id=session_id,
638+
max_rows=max_rows,
639+
max_bytes=max_bytes,
640+
lz4_compression=False,
641+
cursor=cursor,
642+
use_cloud_fetch=False,
643+
parameters=[],
644+
async_op=False,
645+
enforce_embedded_schema_correctness=False,
646+
)
647+
assert result is not None, "execute_command returned None in synchronous mode"
648+
return result
633649

634650
def get_schemas(
635651
self,
636652
session_id: SessionId,
637653
max_rows: int,
638654
max_bytes: int,
639-
cursor: "Cursor",
655+
cursor: Cursor,
640656
catalog_name: Optional[str] = None,
641657
schema_name: Optional[str] = None,
642-
):
643-
"""Not implemented yet."""
644-
raise NotImplementedError("get_schemas is not yet implemented for SEA backend")
658+
) -> SeaResultSet:
659+
"""Get schemas by executing 'SHOW SCHEMAS IN catalog [LIKE pattern]'."""
660+
if not catalog_name:
661+
raise DatabaseError("Catalog name is required for get_schemas")
662+
663+
operation = MetadataCommands.SHOW_SCHEMAS.value.format(catalog_name)
664+
665+
if schema_name:
666+
operation += MetadataCommands.LIKE_PATTERN.value.format(schema_name)
667+
668+
result = self.execute_command(
669+
operation=operation,
670+
session_id=session_id,
671+
max_rows=max_rows,
672+
max_bytes=max_bytes,
673+
lz4_compression=False,
674+
cursor=cursor,
675+
use_cloud_fetch=False,
676+
parameters=[],
677+
async_op=False,
678+
enforce_embedded_schema_correctness=False,
679+
)
680+
assert result is not None, "execute_command returned None in synchronous mode"
681+
return result
645682

646683
def get_tables(
647684
self,
648685
session_id: SessionId,
649686
max_rows: int,
650687
max_bytes: int,
651-
cursor: "Cursor",
688+
cursor: Cursor,
652689
catalog_name: Optional[str] = None,
653690
schema_name: Optional[str] = None,
654691
table_name: Optional[str] = None,
655692
table_types: Optional[List[str]] = None,
656-
):
657-
"""Not implemented yet."""
658-
raise NotImplementedError("get_tables is not yet implemented for SEA backend")
693+
) -> SeaResultSet:
694+
"""Get tables by executing 'SHOW TABLES IN catalog [SCHEMA LIKE pattern] [LIKE pattern]'."""
695+
operation = (
696+
MetadataCommands.SHOW_TABLES_ALL_CATALOGS.value
697+
if catalog_name in [None, "*", "%"]
698+
else MetadataCommands.SHOW_TABLES.value.format(
699+
MetadataCommands.CATALOG_SPECIFIC.value.format(catalog_name)
700+
)
701+
)
702+
703+
if schema_name:
704+
operation += MetadataCommands.SCHEMA_LIKE_PATTERN.value.format(schema_name)
705+
706+
if table_name:
707+
operation += MetadataCommands.LIKE_PATTERN.value.format(table_name)
708+
709+
result = self.execute_command(
710+
operation=operation,
711+
session_id=session_id,
712+
max_rows=max_rows,
713+
max_bytes=max_bytes,
714+
lz4_compression=False,
715+
cursor=cursor,
716+
use_cloud_fetch=False,
717+
parameters=[],
718+
async_op=False,
719+
enforce_embedded_schema_correctness=False,
720+
)
721+
assert result is not None, "execute_command returned None in synchronous mode"
722+
723+
# Apply client-side filtering by table_types
724+
from databricks.sql.backend.sea.utils.filters import ResultSetFilter
725+
726+
result = ResultSetFilter.filter_tables_by_type(result, table_types)
727+
728+
return result
659729

660730
def get_columns(
661731
self,
662732
session_id: SessionId,
663733
max_rows: int,
664734
max_bytes: int,
665-
cursor: "Cursor",
735+
cursor: Cursor,
666736
catalog_name: Optional[str] = None,
667737
schema_name: Optional[str] = None,
668738
table_name: Optional[str] = None,
669739
column_name: Optional[str] = None,
670-
):
671-
"""Not implemented yet."""
672-
raise NotImplementedError("get_columns is not yet implemented for SEA backend")
740+
) -> SeaResultSet:
741+
"""Get columns by executing 'SHOW COLUMNS IN CATALOG catalog [SCHEMA LIKE pattern] [TABLE LIKE pattern] [LIKE pattern]'."""
742+
if not catalog_name:
743+
raise DatabaseError("Catalog name is required for get_columns")
744+
745+
operation = MetadataCommands.SHOW_COLUMNS.value.format(catalog_name)
746+
747+
if schema_name:
748+
operation += MetadataCommands.SCHEMA_LIKE_PATTERN.value.format(schema_name)
749+
750+
if table_name:
751+
operation += MetadataCommands.TABLE_LIKE_PATTERN.value.format(table_name)
752+
753+
if column_name:
754+
operation += MetadataCommands.LIKE_PATTERN.value.format(column_name)
755+
756+
result = self.execute_command(
757+
operation=operation,
758+
session_id=session_id,
759+
max_rows=max_rows,
760+
max_bytes=max_bytes,
761+
lz4_compression=False,
762+
cursor=cursor,
763+
use_cloud_fetch=False,
764+
parameters=[],
765+
async_op=False,
766+
enforce_embedded_schema_correctness=False,
767+
)
768+
assert result is not None, "execute_command returned None in synchronous mode"
769+
return result

src/databricks/sql/backend/sea/utils/constants.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,23 @@ class WaitTimeout(Enum):
4545

4646
ASYNC = "0s"
4747
SYNC = "10s"
48+
49+
50+
class MetadataCommands(Enum):
51+
"""SQL commands used in the SEA backend.
52+
53+
These constants are used for metadata operations and other SQL queries
54+
to ensure consistency and avoid string literal duplication.
55+
"""
56+
57+
SHOW_CATALOGS = "SHOW CATALOGS"
58+
SHOW_SCHEMAS = "SHOW SCHEMAS IN {}"
59+
SHOW_TABLES = "SHOW TABLES IN {}"
60+
SHOW_TABLES_ALL_CATALOGS = "SHOW TABLES IN ALL CATALOGS"
61+
SHOW_COLUMNS = "SHOW COLUMNS IN CATALOG {}"
62+
63+
LIKE_PATTERN = " LIKE '{}'"
64+
SCHEMA_LIKE_PATTERN = " SCHEMA" + LIKE_PATTERN
65+
TABLE_LIKE_PATTERN = " TABLE" + LIKE_PATTERN
66+
67+
CATALOG_SPECIFIC = "CATALOG {}"

0 commit comments

Comments
 (0)