-
Notifications
You must be signed in to change notification settings - Fork 13
remove s3 bucket polling when waiting for transformation results #587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #587 +/- ##
==========================================
+ Coverage 96.20% 96.26% +0.06%
==========================================
Files 29 29
Lines 1870 1902 +32
==========================================
+ Hits 1799 1831 +32
Misses 71 71
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…/ServiceX_frontend into remove-s3-bucket-polling
for more information, see https://pre-commit.ci
…/ServiceX_frontend into remove-s3-bucket-polling
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Client-side removal of Minio bucket polling in favor of ServiceX’s direct results API, with corresponding adapter support and test updates.
- Added
get_transformation_results
method to the ServiceX adapter and wired it into download logic. - Updated core query logic (
download_files
) to pass abegin_at
timestamp and call the new API instead of polling Minio. - Refactored tests to mock
servicex.get_transformation_results
and added unit tests covering its success and error responses.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/test_servicex_dataset.py | Replaced mock_minio.list_bucket with servicex.get_transformation_results mocks and reset calls |
tests/test_servicex_adapter.py | Imported datetime and added async tests for get_transformation_results handling 200/403/404/500 statuses |
tests/test_dataset.py | Passed new begin_at argument to download_files and mocked the ServiceX results API |
servicex/servicex_adapter.py | Implemented get_transformation_results method with status‐code checks |
servicex/query_core.py | Updated transform_complete and download_files to accept and forward begin_at and call the new API |
Comments suppressed due to low confidence (1)
tests/test_servicex_dataset.py:209
- The string literal for
request_id
has mismatched quotes and includes an extra trailing double quote; this will likely cause the test to fail. Use consistent quoting, for example"123-456-789"
.
servicex.submit_transform.return_value = {"request_id": '123-456-789"'}
servicex/query_core.py
Outdated
for file in files: | ||
if file.filename not in files_seen: | ||
file_path = file.get("file-path", "").replace("/", ":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relies on being kept in line with the transformer sidecar, and I think it may fail in certain cases (like the parquet output format is selected). Should we add a new field in the transform_result
table that stores the object_name
as uploaded to S3 (as determined by https://github.com/ssl-hep/ServiceX/blob/1991e6b2ea00dcbd8cdb9b9ed32fd44049f0dea3/transformer_sidecar/src/transformer_sidecar/transformer.py#L349 etc.) ?
servicex/query_core.py
Outdated
@@ -557,15 +566,22 @@ async def get_signed_url( | |||
if self.minio: | |||
# if self.minio exists, self.current_status will too | |||
if self.current_status.files_completed > len(files_seen): | |||
files = await self.minio.list_bucket() | |||
new_begin_at = datetime.datetime.now(tz=datetime.timezone.utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is relying on synchronization of clocks between the client and the server. Wouldn't it be better to set new_begin_at
to be the latest result timestamp we see in the transform_results?
…/ServiceX_frontend into remove-s3-bucket-polling
for more information, see https://pre-commit.ci
…into remove-s3-bucket-polling
…/ServiceX_frontend into remove-s3-bucket-polling
…into remove-s3-bucket-polling
for more information, see https://pre-commit.ci
Client side work for ssl-hep/ServiceX#1049