Skip to content

{2023.06}[2023a,a64fx] add TensorFlow 2.13 #1034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

trz42
Copy link
Collaborator

@trz42 trz42 commented Apr 20, 2025

Adds TensorFlow 2.13. Limits parallelism to 8 (via eb_hooks.py) in order to work around out-of-memory issue.

@trz42 trz42 added 2023.06-software.eessi.io 2023.06 version of software.eessi.io a64fx labels Apr 20, 2025
Copy link

eessi-bot bot commented Apr 20, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@eessi-bot-deucalion
Copy link

Instance eessi-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Copy link

eessi-bot bot commented Apr 20, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-trz42
Copy link

Instance trz42-GH200-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@eessi-bot-toprichard
Copy link

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 20, 2025

Instance eessi-bot-vsc-ugent is configured to build for:

  • architectures: x86_64/amd/zen3
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat

@trz42
Copy link
Collaborator Author

trz42 commented Apr 20, 2025

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

Copy link

eessi-bot bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Apr 20, 2025

Updates by the bot instance trz42-GH200-jr (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 20, 2025

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.04/pr_1034/406938

date job status comment
Apr 20 06:33:05 UTC 2025 submitted job id 406938 awaits release by job manager
Apr 20 06:33:55 UTC 2025 released job awaits launch by Slurm scheduler
Apr 20 06:34:57 UTC 2025 running job 406938 is running
Apr 21 06:24:11 UTC 2025 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job406938.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Apr 21 06:24:11 UTC 2025 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job406938.test does not exist in job directory, or parsing it failed.

@trz42
Copy link
Collaborator Author

trz42 commented Apr 20, 2025

First job seems a bit slow. Launch one with parallel = 8...
bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

Copy link

eessi-bot bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Apr 20, 2025

Updates by the bot instance trz42-GH200-jr (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 20, 2025

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.04/pr_1034/406966

date job status comment
Apr 20 17:16:07 UTC 2025 submitted job id 406966 awaits release by job manager
Apr 20 17:17:06 UTC 2025 released job awaits launch by Slurm scheduler
Apr 20 17:18:12 UTC 2025 running job 406966 is running
Apr 21 06:13:53 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-406966.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-a64fx-1745215393.tar.gzsize: 293 MiB (307684455 bytes)
entries: 17257
modules under 2023.06/software/linux/aarch64/a64fx/modules/all
TensorFlow/2.13.0-foss-2023a.lua
software under 2023.06/software/linux/aarch64/a64fx/software
TensorFlow/2.13.0-foss-2023a
other under 2023.06/software/linux/aarch64/a64fx
2023.06/init/easybuild/eb_hooks.py
Apr 21 06:13:53 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/1) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:aarch64_a64fx+default
P: perf: 12.815 timesteps/s (r:0, l:None, u:None)
[ PASSED ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-406966.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 22 14:15:06 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-aarch64-a64fx-1745215393.tar.gz to S3 bucket succeeded

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 20, 2025

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.04/pr_1034/406967

date job status comment
Apr 20 17:17:49 UTC 2025 submitted job id 406967 awaits release by job manager
Apr 20 17:18:10 UTC 2025 released job awaits launch by Slurm scheduler
Apr 20 17:19:19 UTC 2025 running job 406967 is running
Apr 21 02:00:07 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-406967.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-a64fx-1745199692.tar.gzsize: 0 MiB (15594 bytes)
entries: 1
modules under 2023.06/software/linux/aarch64/a64fx/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/a64fx/software
no software packages in tarball
other under 2023.06/software/linux/aarch64/a64fx
2023.06/init/easybuild/eb_hooks.py
Apr 21 02:00:07 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/1) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:aarch64_a64fx+default
P: perf: 15.758 timesteps/s (r:0, l:None, u:None)
[ PASSED ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-406967.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented Apr 20, 2025

One more with parallel = 16...
bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

Copy link

eessi-bot bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 20, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Apr 20, 2025

Updates by the bot instance trz42-GH200-jr (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 20, 2025

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.04/pr_1034/406968

date job status comment
Apr 20 17:18:48 UTC 2025 submitted job id 406968 awaits release by job manager
Apr 20 17:19:17 UTC 2025 released job awaits launch by Slurm scheduler
Apr 20 17:20:25 UTC 2025 running job 406968 is running
Apr 21 00:48:06 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-406968.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-a64fx-1745195417.tar.gzsize: 0 MiB (15597 bytes)
entries: 1
modules under 2023.06/software/linux/aarch64/a64fx/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/a64fx/software
no software packages in tarball
other under 2023.06/software/linux/aarch64/a64fx
2023.06/init/easybuild/eb_hooks.py
Apr 21 00:48:06 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/1) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:aarch64_a64fx+default
P: perf: 13.684 timesteps/s (r:0, l:None, u:None)
[ PASSED ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-406968.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42 trz42 added ready-to-deploy Mark a PR as ready to deploy ready-to-review labels Apr 21, 2025
…-layer into 2023.06-a64fx-2023a-eb482-apps-tf

kept original order to list TensorFlow right after all its dependencies
if cpu_target == CPU_TARGET_A64FX and self.name in ['TensorFlow']:
# limit parallelism to 8, builds with 12 and 16 failed on Deucalion
if parallel > 8:
self.cfg['parallel'] = 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trz42 Why don't we simply use a factor of 4 when building for A64FX, rather than a factor of 2 like we do below?

In theory, we could have smaller build jobs (say with 4 cores) at some point, so really hardcoding to 8 seems wrong to me...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I overlooked the > 8 condition, sorry

@boegel boegel added bot:deploy Ask bot to deploy missing software installations to EESSI and removed ready-to-deploy Mark a PR as ready to deploy labels Apr 22, 2025
@eessi-bot-toprichard
Copy link

Label bot:deploy has been set by user boegel, but this person does not have permission to trigger deployments

@boegel
Copy link
Contributor

boegel commented Apr 22, 2025

staging PR merged, so merging this too...

@boegel boegel merged commit 77651f1 into EESSI:2023.06-software.eessi.io Apr 22, 2025
64 of 66 checks passed
Copy link

eessi-bot bot commented Apr 22, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.04.22

1 similar comment
Copy link

eessi-bot bot commented Apr 22, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.04.22

@eessi-bot-deucalion
Copy link

PR merged! Moved ['/home/eessibot/new-bot/jobs/2025.04/pr_1034/406968', '/home/eessibot/new-bot/jobs/2025.04/pr_1034/406938', '/home/eessibot/new-bot/jobs/2025.04/pr_1034/406967', '/home/eessibot/new-bot/jobs/2025.04/pr_1034/406966'] to /home/eessibot/new-bot/trash-bin/EESSI/software-layer/2025.04.22

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 22, 2025

PR merged! Moved [] to /scratch/gent/vo/002/gvo00211/SHARED/trash_bin/EESSI/software-layer/2025.04.22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io a64fx bot:deploy Ask bot to deploy missing software installations to EESSI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants