Skip to content

fix cudnn hook for non eessi installation #1099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 28, 2025

Conversation

pescobar
Copy link
Contributor

while doing local installations we got this error installing cuddn

== 2025-05-28 11:41:04,652 hooks.py:249 INFO Running 'post_postproc_hook' hook function (args: [<easybuild.easyblocks.cudnn.EB_cuDNN object at 0x7fe233fd2450>], keyword args: {})...
== 2025-05-28 11:41:04,816 build_log.py:226 ERROR EasyBuild encountered an error (at easybuild/tools/build_log.py:166 in caller_info): cuDNN-specific hook triggered for non-cuDNN easyconfig?! (at easybuild/eb_hooks.py:1073 in post_postproc_cudnn)

Reviewing the hook we noticed that if line if self.name == 'cuDNN' and eessi_installation: returns False (like in our case) then you always end up in raise EasyBuildError("cuDNN-specific hook triggered for non-cuDNN easyconfig?!")

I modified the cudnn hook taking as reference the cuda hook above it so the logic is

if eessi_install
    run_hook
else
    print_msg(f"EESSI hook to respect cuDDN license not triggered for installation path {self.installdir}")

@ocaisa
Copy link
Member

ocaisa commented May 28, 2025

@pescobar Can you sync this with the default branch, we just merged an update in #1096

@pescobar
Copy link
Contributor Author

@ocaisa done

@ocaisa
Copy link
Member

ocaisa commented May 28, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented May 28, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen2
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen2 resulted in:

    • no jobs were submitted

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented May 28, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.05/pr_1099/65833

date job status comment
May 28 14:02:48 UTC 2025 submitted job id 65833 awaits release by job manager
May 28 14:03:38 UTC 2025 released job awaits launch by Slurm scheduler
May 28 15:04:10 UTC 2025 running job 65833 is running
May 28 15:12:11 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-65833.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-17484446830.tar.gzsize: 0 MiB (16287 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen2/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/init/easybuild/eb_hooks.py
May 28 15:12:11 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:x86_64_amd_zen2+default
P: perf: 442.109 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:x86_64_amd_zen2+default
P: perf: 448.028 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:x86_64_amd_zen2+default
P: latency: 1.78 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 4.55 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 4.05 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 4.26 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 0.56 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:x86_64_amd_zen2+default
P: latency: 0.6 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:x86_64_amd_zen2+default
P: bandwidth: 7337.91 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:x86_64_amd_zen2+default
P: bandwidth: 7333.27 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-65833.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
May 28 15:15:02 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-17484446830.tar.gz to S3 bucket succeeded

@ocaisa
Copy link
Member

ocaisa commented May 28, 2025

Verified with a build

$ eb cuDNN-8.9.2.26-CUDA-12.1.1.eb  --accept-eula-for=cuDNN
...
== Running post-postproc hook...
== ... (took 1 secs)
== FAILED: Installation ended unsuccessfully: cuDNN-specific hook triggered for non-cuDNN easyconfig?! (took 1 min 32 secs)
== Results of the build can be found in the log file(s) /tmp/eb-5h10jcrv/easybuild-cuDNN-8.9.2.26-20250528.160434.iebeC.log
== Summary:
   * [FAILED]  cuDNN/8.9.2.26-CUDA-12.1.1
...

and

$ eb cuDNN-8.9.2.26-CUDA-12.1.1.eb --accept-eula-for=cuDNN --hooks=./eb_hooks.py
...
== Running post-postproc hook...
== EESSI hook to respect cuDDN license not triggered for installation path
/home/ocaisa/eessi/versions/2023.06/software/linux/x86_64/intel/skylake_avx512/software/cuDNN/8.9.2.26-CUDA-12.1.1
== ... (took < 1 sec)
...

@ocaisa ocaisa added the bot:deploy Ask bot to deploy missing software installations to EESSI label May 28, 2025
@ocaisa
Copy link
Member

ocaisa commented May 28, 2025

Going in, thanks @pescobar

@ocaisa ocaisa merged commit 63d20d4 into EESSI:2023.06-software.eessi.io May 28, 2025
67 of 80 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:deploy Ask bot to deploy missing software installations to EESSI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants