-
-
Notifications
You must be signed in to change notification settings - Fork 23
Refactor uproot-raw to chunk the processing #1059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the uproot-raw processing to chunk transformation outputs, reducing memory usage under Kubernetes limits.
- Updated test expectations to reflect the new processing output.
- Introduced timing wrappers and incremental processing in the transformation and query translation code.
- Modified the query translation to yield results on the fly instead of collecting them in dictionaries.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
code_generator_raw_uproot/tests/test_src.py | Updated expected hash in the code generation test to match the new output. |
code_generator_raw_uproot/servicex/templates/transform_single_file.py | Added timing functions and refactored transformation logic for chunked processing along with enhanced error messaging. |
code_generator_raw_uproot/servicex/raw_uproot_code_generator/request_translator.py | Changed query processing to yield results incrementally and improved variable naming for clarity. |
Comments suppressed due to low confidence (1)
code_generator_raw_uproot/servicex/raw_uproot_code_generator/request_translator.py:143
- [nitpick] Consider renaming 'arrfound' to a more descriptive name such as 'chunk_found' or 'has_found_chunk' for improved code readability.
arrfound = False
code_generator_raw_uproot/servicex/templates/transform_single_file.py
Outdated
Show resolved
Hide resolved
I should note this is vaguely urgent, Marc has seen workflows that fail without this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not qualified to review this code, but I appreciate the idea
* Refactor uproot-raw to write out results in chunks in order to keep memory use down
* Refactor uproot-raw to write out results in chunks in order to keep memory use down
Sufficiently large uproot-raw requests could allocate amounts of memory too large to fit in the default 1 GB Kubernetes limit (as they built the final results in-memory). We now instead write out the results of each
uproot.iterate
chunk as they arrive. Take the opportunity to refactor a few aspects of the writing code.Fixes #1045