Skip to content

{2023.06}[2023a,a64fx] add TensorFlow 2.13 #1034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@ easyconfigs:
- nsync-1.26.0-GCCcore-12.3.0.eb
- RE2-2023-08-01-GCCcore-12.3.0.eb
- protobuf-python-4.24.0-GCCcore-12.3.0.eb
## originally built with EB 4.8.2; PR 19268 included since EB 4.9.0
## - TensorFlow-2.13.0-foss-2023a.eb:
## # patch setup.py for grpcio extension in TensorFlow 2.13.0 easyconfigs to take into account alternate sysroot;
## # see https://github.com/easybuilders/easybuild-easyconfigs/pull/19268
## options:
## from-pr: 19268
# - TensorFlow-2.13.0-foss-2023a.eb
# originally built with EB 4.8.2; PR 19268 included since EB 4.9.0
# - TensorFlow-2.13.0-foss-2023a.eb:
# # patch setup.py for grpcio extension in TensorFlow 2.13.0 easyconfigs to take into account alternate sysroot;
# # see https://github.com/easybuilders/easybuild-easyconfigs/pull/19268
# options:
# from-pr: 19268
- TensorFlow-2.13.0-foss-2023a.eb
- X11-20230603-GCCcore-12.3.0.eb
# originally built with EB 4.8.2; PR 19339 included since EB 4.9.0
# - HarfBuzz-5.3.1-GCCcore-12.3.0.eb:
Expand Down
8 changes: 7 additions & 1 deletion eb_hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,13 @@ def post_ready_hook(self, *args, **kwargs):
memory_hungry_build_a64fx = cpu_target == CPU_TARGET_A64FX and self.name in ['Qt5', 'ROOT']
if memory_hungry_build or memory_hungry_build_a64fx:
parallel = self.cfg['parallel']
if parallel > 1:
if cpu_target == CPU_TARGET_A64FX and self.name in ['TensorFlow']:
# limit parallelism to 8, builds with 12 and 16 failed on Deucalion
if parallel > 8:
self.cfg['parallel'] = 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trz42 Why don't we simply use a factor of 4 when building for A64FX, rather than a factor of 2 like we do below?

In theory, we could have smaller build jobs (say with 4 cores) at some point, so really hardcoding to 8 seems wrong to me...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I overlooked the > 8 condition, sorry

msg = "limiting parallelism to %s (was %s) for %s on %s to avoid out-of-memory failures during building/testing"
print_msg(msg % (self.cfg['parallel'], parallel, self.name, cpu_target), log=self.log)
elif parallel > 1:
self.cfg['parallel'] = parallel // 2
msg = "limiting parallelism to %s (was %s) for %s to avoid out-of-memory failures during building/testing"
print_msg(msg % (self.cfg['parallel'], parallel, self.name), log=self.log)
Expand Down