Skip to content

Conversation

zoghbi-a
Copy link
Contributor

@zoghbi-a zoghbi-a commented Sep 8, 2025

This PR adds several enhancements that have been added to include feedback from users (including reported issues; e.g #3387, #3321) since the last refactor to use the VO backend. The changes include:

  • Add query_by_column to allow querying of different catalog columns. The user passes a dict that is parsed into a TAP WHERE statements.
  • Add support for uploading tables when using TAP directly through query_tap.
  • Improve how maxrec works. If it is bigger than the default server limit, add a TOP statement.
  • Add automatic guessing for the data host in download_data

Because query_by_column shares code with the existing query_region, the shared part has now been moved to an internal function _query_execute, which is called by both query_by_column and query_region.

@zoghbi-a zoghbi-a marked this pull request as draft September 8, 2025 15:30
Copy link

codecov bot commented Sep 8, 2025

Codecov Report

❌ Patch coverage is 95.58824% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.68%. Comparing base (b918da4) to head (eb052f6).
⚠️ Report is 119 commits behind head on main.

Files with missing lines Patch % Lines
astroquery/heasarc/core.py 95.58% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3403      +/-   ##
==========================================
+ Coverage   70.48%   70.68%   +0.20%     
==========================================
  Files         232      232              
  Lines       19977    20070      +93     
==========================================
+ Hits        14080    14186     +106     
+ Misses       5897     5884      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zoghbi-a zoghbi-a marked this pull request as ready for review September 8, 2025 16:18
@zoghbi-a zoghbi-a changed the title ENH: Add ability to query columns and upload tables Add ability to query columns and upload tables Sep 8, 2025
@bsipocz bsipocz added this to the 0.4.12 milestone Sep 18, 2025
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoghbi-a - Thanks for the PR, I have some review comments but this should be close to be mergeable.

The most substantial one where I would also ask Adam's insight is the naming of the new method as we have and can have in the future a similar functionality in other modules, too, so some consistency would be nice.

Parameters
----------
links : `astropy.table.Table` or `astropy.table.Row`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably you called this parameter links because these suppose to be datalinks? If yes, please add one more sentence that explains it.

return 'aws'
return 'heasarc'

def download_data(self, links, host=None, location='.'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not a change, but while we're at it, or in a follow-up, please make optional arguments keyword only for future proofing this.

Suggested change
def download_data(self, links, host=None, location='.'):
def download_data(self, links, *, host=None, location='.'):

return self.query_region(pos, catalog=mission, spatial='cone',
get_query_payload=get_query_payload)

def query_by_column(self, catalog, params, *,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment here is about the naming, we don't have methods named this, but we do have query_criteria. How would you feel about using that, too?

Also, I would suggest to use something else then 'params', maybe constraint, or filter, or condition, or something even more descriptive?

cc @keflavich to have multiple eyes/opinions on the API naming choices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see another archive supporting this type of query, it's incredibly useful in Vizier:
https://astroquery.readthedocs.io/en/latest/vizier/vizier.html#specifying-keywords-output-columns-and-constraints-on-columns

I favor query_constraints because it's a pre-existing nomenclature that astroquery users may be accustomed to.

Instead of params, how about column_filters, again by analogy to Vizier?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting query_criteria as it's used in more modules, but OTOH I admit it's kind of confusing naming. btw, here is the quick grep of what we have right now, including this PR:

   1 query_aliases
   1 query_apex_quicklooks
   7 query_async
   1 query_bibcode
   1 query_bibobj
   1 query_by_column
   1 query_catalog
   1 query_constraints_async
   3 query_criteria
   4 query_criteria_async
   1 query_criteria_count
   1 query_cross_id
   1 query_cross_id_async
   1 query_crossid_async
   1 query_hierarchy
   1 query_hips
   1 query_hsa_tap
   1 query_hsc_matchid_async
   1 query_ida_tap
   1 query_ids_catalogs
   1 query_ids_maps
   1 query_ids_spectra
   1 query_instrument
   4 query_lines_async
   1 query_loc
   1 query_main
   1 query_metadata
   1 query_mission_cols
   1 query_mission_list
   1 query_molecule
   1 query_name_async
  10 query_object
  13 query_object_async
   1 query_object_catalogs
   1 query_object_count
   1 query_object_maps
   1 query_object_spectra
   1 query_objectids
   1 query_objects
   1 query_objects_async
   1 query_observations
   1 query_photoobj_async
   1 query_planet
   1 query_raw
   1 query_refcode
   1 query_refcode_async
  12 query_region
  16 query_region_async
   1 query_region_catalogs
   1 query_region_count
   1 query_region_iau
   1 query_region_iau_async
   1 query_region_maps
   1 query_region_sia
   1 query_region_spectra
   2 query_sia
   1 query_simple
   1 query_specobj_async
   2 query_sql_async
   1 query_ssa
   1 query_sso
   1 query_surveys
   6 query_tap
   2 query_target
   1 query_with_wcs
   1 query_with_wcs_async
   1 query_xsa_tap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will switch to query_constraints and column_filters given the precedence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing how it is done in the Vizier module, I am tempted to make the column constraint in query_region and remove the extra function. Apologies for not seeing the precedence before. What do you think?

Comment on lines +291 to +292
return self.tap.search(
query, language='ADQL', maxrec=maxrec, uploads=uploads)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick as it doesn't matter really, but just a FYI: we allow long (120) lines here, no need to do these kind of line breaks

Comment on lines +354 to +356
# if maxrec is more than the server limit, we set a higher limit
if maxrec is not None and maxrec > 100000:
adql = adql.replace('SELECT ', f'SELECT TOP {maxrec*4} ')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain this logic a little bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trying to address #3387, but looking at it now, it is only a narrow fix. I will need to check with the backend team for this. It looks to me it could be a backend issue.

return cols

def query_tap(self, query, *, maxrec=None):
def query_tap(self, query, *, maxrec=None, uploads=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think propagating the table uploads could be super useful for the other methods calling query_tap internally, so please consider adding it to query_region and the new query_by_column, too.

Copy link
Contributor Author

@zoghbi-a zoghbi-a Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uploads make sense only if the user writes a TAP query because they need to reference the uploaded columns in the query. I am not sure it can be done in a generic way in query_region or query_by_column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants