Skip to content

Enhancement: Making behaviour of TAPService.search() configurable (Or default to async where available) #685

@stvoutsin

Description

@stvoutsin

Summary of problem

Currently TAPService.search() is just an alias to run_sync(), which means that all queries use sync execution regardless of service capabilities.
I think for many/most service providers this may not be the optimal choice, and at least in our case we'd prefer if search uses run_async.

# Current implementation
search = run_sync

Proposed Solutions

I see two potential approaches for implementing this enhancement:

Approach 1: Automatic async-first (capability-based default)

Make search() automatically check capabilities and use async when available:

def search(self, query, *, language="ADQL", maxrec=None, uploads=None, 
           force_sync=False, **keywords):
    if force_sync or not self._supports_async():
        return self.run_sync(query, language=language, maxrec=maxrec, uploads=uploads, **keywords)
    else:
        return self.run_async(query, language=language, maxrec=maxrec, uploads=uploads, **keywords)

This approach does change the default-behaviour (though I would say it is backward compatible) so that queries will be run async but that shouldn't be visible to users.

With this option we get the benefits and stability of asynchronous job execution and the benefit is that it does not require any changes to the API.

Approach 2: Opt-in via service parameter

Add prefer_async parameter to TAPService.__init__():

class TAPService:
    def __init__(self, baseurl, *, capability_description=None, session=None, 
                 prefer_async=False):
        # ...
        self._prefer_async = prefer_async

    def search(self, query, *, language="ADQL", maxrec=None, uploads=None, 
               force_sync=False, **keywords):
        if self._prefer_async and not force_sync and self._supports_async():
            return self.run_async(query, language=language, maxrec=maxrec, uploads=uploads, **keywords)
        else:
            return self.run_sync(query, language=language, maxrec=maxrec, uploads=uploads, **keywords)

The benefits here is that users will get the exact same behavour (zero-risk) and it is an opt-in feature, while leaving the door open for a future change to the default behaviour.
Downside is probably that users need to know about the option to enable it so it may have slower adoption.

Both approaches would check the /capabilities to see whether the service supports async:

Usage Examples

Approach 1: Automatic async-first

service = TAPService("http://example.com/tap")
results = service.search("SELECT * FROM table")  # Uses async if available

# Can still force sync when needed
results = service.search("SELECT * FROM table", force_sync=True)

Approach 2: Opt-in parameter

# Default behavior (unchanged)
service = TAPService("http://example.com/tap")
results = service.search("SELECT * FROM table")  # Always sync

# Opt-in to async-prefer behavior
service = TAPService("http://example.com/tap", prefer_async=True)
results = service.search("SELECT * FROM table")  # Uses async if available

# Can still force sync
results = service.search("SELECT * FROM table", force_sync=True)

Any thoughts on this? If it makes sense I'm happy to work on a PR to enable it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions