Refactor uproot-raw to chunk the processing #1059

ponyisi · 2025-05-19T20:46:55Z

Sufficiently large uproot-raw requests could allocate amounts of memory too large to fit in the default 1 GB Kubernetes limit (as they built the final results in-memory). We now instead write out the results of each uproot.iterate chunk as they arrive. Take the opportunity to refactor a few aspects of the writing code.

Fixes #1045

Copilot

Pull Request Overview

This PR refactors the uproot-raw processing to chunk transformation outputs, reducing memory usage under Kubernetes limits.

Updated test expectations to reflect the new processing output.
Introduced timing wrappers and incremental processing in the transformation and query translation code.
Modified the query translation to yield results on the fly instead of collecting them in dictionaries.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
code_generator_raw_uproot/tests/test_src.py	Updated expected hash in the code generation test to match the new output.
code_generator_raw_uproot/servicex/templates/transform_single_file.py	Added timing functions and refactored transformation logic for chunked processing along with enhanced error messaging.
code_generator_raw_uproot/servicex/raw_uproot_code_generator/request_translator.py	Changed query processing to yield results incrementally and improved variable naming for clarity.

Comments suppressed due to low confidence (1)

code_generator_raw_uproot/servicex/raw_uproot_code_generator/request_translator.py:143

[nitpick] Consider renaming 'arrfound' to a more descriptive name such as 'chunk_found' or 'has_found_chunk' for improved code readability.

arrfound = False

code_generator_raw_uproot/tests/test_src.py

code_generator_raw_uproot/servicex/templates/transform_single_file.py

ponyisi · 2025-05-19T21:08:33Z

I should note this is vaguely urgent, Marc has seen workflows that fail without this.

BenGalewsky

I'm not qualified to review this code, but I appreciate the idea

* Refactor uproot-raw to write out results in chunks in order to keep memory use down

ponyisi added 2 commits May 19, 2025 15:40

Refactor uproot-raw to chunk processing

96c90e0

Fix test hash

559eada

ponyisi requested review from Copilot, gordonwatts, MattShirley, BenGalewsky and kyungeonchoi May 19, 2025 20:49

Copilot AI reviewed May 19, 2025

View reviewed changes

code_generator_raw_uproot/tests/test_src.py Show resolved Hide resolved

code_generator_raw_uproot/servicex/templates/transform_single_file.py Outdated Show resolved Hide resolved

Ensure parquet writer closed even if exception thrown

5d33510

BenGalewsky approved these changes May 21, 2025

View reviewed changes

ponyisi merged commit d436e73 into develop May 28, 2025
75 checks passed

ponyisi deleted the uproot-raw-iterate branch May 28, 2025 01:22

ponyisi added a commit that referenced this pull request Jun 20, 2025

Refactor uproot-raw to chunk the processing (#1059)

8642056

* Refactor uproot-raw to write out results in chunks in order to keep memory use down

ponyisi added a commit that referenced this pull request Jun 20, 2025

Refactor uproot-raw to chunk the processing (#1059)

f8fa1c4

* Refactor uproot-raw to write out results in chunks in order to keep memory use down

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor uproot-raw to chunk the processing #1059

Refactor uproot-raw to chunk the processing #1059

Uh oh!

ponyisi commented May 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

ponyisi commented May 19, 2025

Uh oh!

BenGalewsky left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Refactor uproot-raw to chunk the processing #1059

Refactor uproot-raw to chunk the processing #1059

Uh oh!

Conversation

ponyisi commented May 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

ponyisi commented May 19, 2025

Uh oh!

BenGalewsky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!