Skip to content

Generalise Backend Layer #604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Generalise Backend Layer #604

wants to merge 25 commits into from

Conversation

varun-edachali-dbx
Copy link
Collaborator

@varun-edachali-dbx varun-edachali-dbx commented Jun 18, 2025

What type of PR is this?

  • Refactor

Description

  • Separate the Session functionality from the Connection class to further abstract backend implementation details from the connection class: Separate Session related functionality from Connection class #571
  • Introduce a general backend interface (DatabricksClient): Introduce Backend Interface (DatabricksClient) #573
    • Introduce the DatabricksClient interface and make the existing ThriftBackend implement it as ThriftDatabricksClient.
    • Introduce SessionId and CommandId as consistent adapters for backend layers to represent sessions and commands instead of relying on Thrift specific types.
  • Introduce a general ResultSet interface: Implement ResultSet Abstraction (backend interfaces for fetch phase) #574
    • Create a ResultSet interface and make the existing ResultSet interface implement it as ThriftResultSet.
    • Make the concrete Thrift backend return a ThriftResultSet instead of ExecuteResponse from execution relevant commands.
    • Generalise the return types of methods in the backend interface to prevent alignment with Thrift (introduce CommandState to prevent using Thrift's status types).

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually
  • N/A

Related Tickets & Documents

Design Doc

varun-edachali-dbx and others added 4 commits June 18, 2025 03:47
* decouple session class from existing Connection

ensure maintenance of current APIs of Connection while delegating
responsibility

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add open property to Connection to ensure maintenance of existing API

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* update unit tests to address ThriftBackend through session instead of through Connection

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* chore: move session specific tests from test_client to test_session

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

as in CONTRIBUTING.md

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* use connection open property instead of long chain through session

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* trigger integration workflow

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix: ensure open attribute of Connection never fails

in case the openSession takes long, the initialisation of the session
will not complete immediately. This could make the session attribute
inaccessible. If the Connection is deleted in this time, the open()
check will throw because the session attribute does not exist. Thus, we
default to the Connection being closed in this case. This was not an
issue before because open was a direct attribute of the Connection
class. Caught in the integration tests.

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix: de-complicate earlier connection open logic

earlier, one of the integration tests was failing because 'session was
not an attribute of Connection'. This is likely tied to a local
configuration issue related to unittest that was causing an error in the
test suite itself. The tests are now passing without checking for the
session attribute.
c676f9b

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Revert "fix: de-complicate earlier connection open logic"

This reverts commit d6b1b19.

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* [empty commit] attempt to trigger ci e2e workflow

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Update CODEOWNERS (#562)

new codeowners

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Enhance Cursor close handling and context manager exception management to prevent server side resource leaks (#554)

* Enhance Cursor close handling and context manager exception management

* tests

* fmt

* Fix Cursor.close() to properly handle CursorAlreadyClosedError

* Remove specific test message from Cursor.close() error handling

* Improve error handling in connection and cursor context managers to ensure proper closure during exceptions, including KeyboardInterrupt. Add tests for nested cursor management and verify operation closure on server-side errors.

* add

* add

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* PECOBLR-86 improve logging on python driver (#556)

* PECOBLR-86 Improve logging for debug level

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* PECOBLR-86 Improve logging for debug level

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* fixed format

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* used lazy logging

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* changed debug to error logs

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

* used lazy logging

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>

---------

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Revert "Merge remote-tracking branch 'upstream/sea-migration' into decouple-session"

This reverts commit dbb2ec5, reversing
changes made to 7192f11.

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Reapply "Merge remote-tracking branch 'upstream/sea-migration' into decouple-session"

This reverts commit bdb8381.

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix: separate session opening logic from instantiation

ensures correctness of self.session.open call in Connection

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix: use is_open attribute to denote session availability

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix: access thrift backend through session

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* chore: use get_handle() instead of private session attribute in client

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix: remove accidentally removed assertions

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

---------

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
Co-authored-by: Jothi Prakash <jothi.prakash@databricks.com>
Co-authored-by: Madhav Sainanee <madhav.sainanee@databricks.com>
Co-authored-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
NOTE: the `test_complex_types` e2e test was not working at the time of this merge. The test must be triggered when the test is back up and running as intended. 

* remove excess logs, assertions, instantiations

large merge artifacts

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black) + remove excess log (merge artifact)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix typing

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary check

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary replace call

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* introduce __str__ methods for CommandId and SessionId

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* docstrings for DatabricksClient interface

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* stronger typing of Cursor and ExecuteResponse

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove utility functions from backend interface, fix circular import

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* rename info to properties

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* newline for cleanliness

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix circular import

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* to_hex_id -> get_hex_id

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* better comment on protocol version getter

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move guid to hex id to new utils module

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move staging allowed local path to connection props

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add strong return type for execute_command

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* skip auth, error handling in databricksclient interface

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* chore: docstring + line width

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* get_id -> get_guid

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* chore: docstring

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix: to_hex_id -> to_hex_guid

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

---------
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
…574)

* ensure backend client returns a ResultSet type in backend tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* newline for cleanliness

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix circular import

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* to_hex_id -> get_hex_id

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* better comment on protocol version getter

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* stricter typing for cursor

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* correct typing

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* correct tests and merge artifacts

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove accidentally modified workflow files

remnants of old merge

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* chore: remove accidentally modified workflow files

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add back accidentally removed docstrings

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* clean up docstrings

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* log hex

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove unnecessary _replace call

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add __str__ for CommandId

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* take TOpenSessionResp in get_protocol_version to maintain existing interface

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* active_op_handle -> active_mmand_id

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* ensure None returned for close_command

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* account for ResultSet return in new pydocs

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* pydoc for types

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move common state to ResultSet aprent

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* stronger typing in resultSet behaviour

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundant patch in test

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add has_been_closed_server_side assertion

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundancies in tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* more robust close check

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* use normalised state in e2e test

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* simplify corrected test

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add line gaps after multi-line pydocs for consistency

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* use normalised CommandState type in ExecuteResponse

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

---------

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
…nd forward refs, remove some unused imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
@@ -938,58 +991,106 @@ def execute_command(

if async_op:
self._handle_execute_response_async(resp, cursor)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a EmptyResultSet instead of none cc @jayantsing-db

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the existing implementation (i.e. the main branch), nothing (None) is returned for async queries. Returning an EmptyResultSet would cause a change to user facing functionality, right?

@@ -26,6 +26,7 @@
TSparkRowSetType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the utils.py file into the utils folder and probably give a new name to it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it should be renamed, does queue.py sound fine?

I think it should remain in the existing directory though. The Connection, Cursor and ResultSet definitions are in the existing directory, so I think the Queue definitions should remain here too.
Also, if you were referring to backend/utils by utils folder above, then I don't think this should move there because this is not backend specific.

cc @jayantsing-db

"""Fetch all remaining rows as an Arrow table."""
pass

def close(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are creating an ABC leave all of them as ABC. Don't implement anything

Copy link
Collaborator Author

@varun-edachali-dbx varun-edachali-dbx Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for that?

Abstract classes are allowed to have concrete methods, and in this case the close implementation only depends on state that is defined in ResultSet, so I think it belongs here. This can help reduce repetition.

cc @jayantsing-db

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
…nd Cursor

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants