-
Notifications
You must be signed in to change notification settings - Fork 13
remove s3 bucket polling when waiting for transformation results #587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #587 +/- ##
==========================================
+ Coverage 96.35% 96.50% +0.14%
==========================================
Files 29 29
Lines 1892 1973 +81
==========================================
+ Hits 1823 1904 +81
Misses 69 69
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…/ServiceX_frontend into remove-s3-bucket-polling
for more information, see https://pre-commit.ci
…/ServiceX_frontend into remove-s3-bucket-polling
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Client-side removal of Minio bucket polling in favor of ServiceX’s direct results API, with corresponding adapter support and test updates.
- Added
get_transformation_results
method to the ServiceX adapter and wired it into download logic. - Updated core query logic (
download_files
) to pass abegin_at
timestamp and call the new API instead of polling Minio. - Refactored tests to mock
servicex.get_transformation_results
and added unit tests covering its success and error responses.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/test_servicex_dataset.py | Replaced mock_minio.list_bucket with servicex.get_transformation_results mocks and reset calls |
tests/test_servicex_adapter.py | Imported datetime and added async tests for get_transformation_results handling 200/403/404/500 statuses |
tests/test_dataset.py | Passed new begin_at argument to download_files and mocked the ServiceX results API |
servicex/servicex_adapter.py | Implemented get_transformation_results method with status‐code checks |
servicex/query_core.py | Updated transform_complete and download_files to accept and forward begin_at and call the new API |
Comments suppressed due to low confidence (1)
tests/test_servicex_dataset.py:209
- The string literal for
request_id
has mismatched quotes and includes an extra trailing double quote; this will likely cause the test to fail. Use consistent quoting, for example"123-456-789"
.
servicex.submit_transform.return_value = {"request_id": '123-456-789"'}
servicex/query_core.py
Outdated
for file in files: | ||
if file.filename not in files_seen: | ||
file_path = file.get("file-path", "").replace("/", ":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relies on being kept in line with the transformer sidecar, and I think it may fail in certain cases (like the parquet output format is selected). Should we add a new field in the transform_result
table that stores the object_name
as uploaded to S3 (as determined by https://github.com/ssl-hep/ServiceX/blob/1991e6b2ea00dcbd8cdb9b9ed32fd44049f0dea3/transformer_sidecar/src/transformer_sidecar/transformer.py#L349 etc.) ?
servicex/query_core.py
Outdated
@@ -557,15 +566,22 @@ async def get_signed_url( | |||
if self.minio: | |||
# if self.minio exists, self.current_status will too | |||
if self.current_status.files_completed > len(files_seen): | |||
files = await self.minio.list_bucket() | |||
new_begin_at = datetime.datetime.now(tz=datetime.timezone.utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is relying on synchronization of clocks between the client and the server. Wouldn't it be better to set new_begin_at
to be the latest result timestamp we see in the transform_results?
…/ServiceX_frontend into remove-s3-bucket-polling
for more information, see https://pre-commit.ci
…into remove-s3-bucket-polling
…/ServiceX_frontend into remove-s3-bucket-polling
…into remove-s3-bucket-polling
for more information, see https://pre-commit.ci
servicex/query_core.py
Outdated
@@ -551,21 +557,27 @@ async def get_signed_url( | |||
if progress: | |||
progress.advance(task_id=download_progress, task_type="Download") | |||
|
|||
later_than: datetime.datetime | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not initialize later_than
to datetime.min
to avoid all of the extra if later_than is None or
constructs?
…into remove-s3-bucket-polling
Added the feature flag and updated tests. All work here is done and ready for final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this has been tested with a serviceX deployment that doesn't yet have the capabilities
property in the info?
Yes, I've tested this against ServiceX's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, see the few comments.
return self._servicex_info | ||
|
||
async def get_servicex_capabilities(self) -> List[str]: | ||
return (await self.get_servicex_info()).capabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test for this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Client side work for ssl-hep/ServiceX#1049