Skip to content

Remove some unnecessary copies during iteration over splitter that's given a file-based dataset source #544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

yousefmoazzam
Copy link
Collaborator

@yousefmoazzam yousefmoazzam commented Feb 12, 2025

Fixes #543

With this simple change, profiling the block splitter script mentioned in #543 now shows the same behaviour as not using the tqdm progress bar iterable at all (ie, the loop being defined as for block in progress rather than what it was posted in that issue):
with-tqdm-no-splitter-iterable

Furthermore, the original pipeline that was run on ws582 at DLS which previously was getting a RAM OOM error now runs without issue.

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have made corresponding changes to the documentation

As the [`tqdm` docs](https://tqdm.github.io/docs/tqdm/#tqdm-objects)
points out, the wrapper iterable behaves exactly as the wrapped iterable
but with extra progress bar functionality. Therefore, the splitter
iterable shouldn't be used directly to gte blocks, but rather the
progress bar iterable that wraps the splitter iterable. See issue #543
for more details.
@yousefmoazzam yousefmoazzam merged commit 74b94c9 into main Feb 13, 2025
9 checks passed
@yousefmoazzam yousefmoazzam deleted the issue-543 branch February 13, 2025 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iteration over block splitter and tqdm iterator causing unncessary copies of block data
1 participant