Skip to content

remove s3 bucket polling when waiting for transformation results #587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

MattShirley
Copy link
Collaborator

Client side work for ssl-hep/ServiceX#1049

Copy link

codecov bot commented May 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.26%. Comparing base (9c1849d) to head (4e01dee).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #587      +/-   ##
==========================================
+ Coverage   96.20%   96.26%   +0.06%     
==========================================
  Files          29       29              
  Lines        1870     1902      +32     
==========================================
+ Hits         1799     1831      +32     
  Misses         71       71              
Flag Coverage Δ
unittests 96.26% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Client-side removal of Minio bucket polling in favor of ServiceX’s direct results API, with corresponding adapter support and test updates.

  • Added get_transformation_results method to the ServiceX adapter and wired it into download logic.
  • Updated core query logic (download_files) to pass a begin_at timestamp and call the new API instead of polling Minio.
  • Refactored tests to mock servicex.get_transformation_results and added unit tests covering its success and error responses.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_servicex_dataset.py Replaced mock_minio.list_bucket with servicex.get_transformation_results mocks and reset calls
tests/test_servicex_adapter.py Imported datetime and added async tests for get_transformation_results handling 200/403/404/500 statuses
tests/test_dataset.py Passed new begin_at argument to download_files and mocked the ServiceX results API
servicex/servicex_adapter.py Implemented get_transformation_results method with status‐code checks
servicex/query_core.py Updated transform_complete and download_files to accept and forward begin_at and call the new API
Comments suppressed due to low confidence (1)

tests/test_servicex_dataset.py:209

  • The string literal for request_id has mismatched quotes and includes an extra trailing double quote; this will likely cause the test to fail. Use consistent quoting, for example "123-456-789".
servicex.submit_transform.return_value = {"request_id": '123-456-789"'}

for file in files:
if file.filename not in files_seen:
file_path = file.get("file-path", "").replace("/", ":")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on being kept in line with the transformer sidecar, and I think it may fail in certain cases (like the parquet output format is selected). Should we add a new field in the transform_result table that stores the object_name as uploaded to S3 (as determined by https://github.com/ssl-hep/ServiceX/blob/1991e6b2ea00dcbd8cdb9b9ed32fd44049f0dea3/transformer_sidecar/src/transformer_sidecar/transformer.py#L349 etc.) ?

@@ -557,15 +566,22 @@ async def get_signed_url(
if self.minio:
# if self.minio exists, self.current_status will too
if self.current_status.files_completed > len(files_seen):
files = await self.minio.list_bucket()
new_begin_at = datetime.datetime.now(tz=datetime.timezone.utc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relying on synchronization of clocks between the client and the server. Wouldn't it be better to set new_begin_at to be the latest result timestamp we see in the transform_results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants