{2025.06}[foss/2024a] LAMMPS 22Jul2025 with CUDA#1461
{2025.06}[foss/2024a] LAMMPS 22Jul2025 with CUDA#1461laraPPr wants to merge 6 commits intoEESSI:mainfrom
Conversation
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/amd/zen3,accel=nvidia/cc80 |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
@casparvl why is cuda compute capabilities set like this? LAMMPS does not like it. |
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/amd/zen3,accel=nvidia/cc80 |
|
New job on instance
|
Because that's the target for which we want CUDA code to be compiled :D If a particular package doesn't support the suffixes in the targets, we should make sure they get stripped. We could do this in an EESSI hook, but it would be better to do it upstream in EasyBuild. Both Note that the |
|
It is because it is not in this mapping, https://github.com/easybuilders/easybuild-easyblocks/blob/ad5538e0d532f06ecdc801794e390db49aa5c350/easybuild/easyblocks/l/lammps.py#L158-L177. From your explanation I see adding 9.0a as an option if building with |
|
We do not build lammps with We add the following mapping in the easyblock or overwrite cuda_cc in the hook. I prefer option 1. |
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc90 |
|
New job on instance
|
|
nevermind I have access |
|
Can't get to the build logs so need this before I can continue |
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc90 |
|
New job on instance
|
|
Probably hitting the ARM Neon issues on aarch64 with Kokkos due to CUDA being lower than CUDA 13.1 (or 13.2, which officially fixes the issue)? |
|
Instance
|
|
Instance
|
|
Instance
|
|
Instance
|
|
Instance
|
|
Instance
|
|
Instance
|
|
bot:status last_build |
|
This is the status of all the
|
6 similar comments
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
|
bot:status last_build |
|
This is the status of all the
|
6 similar comments
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
@casparvl I tested it locally to debug the ctest that is failing but it is not failing on the zen4-cc90 cluster we have in Ghent. That build however does also fail in the end the CUDA sanity check but I have not looked into it. |
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
The last one went out of memory... That surprises me, because the memory is quite sizeable on these nodes. These build jobs get 180G of memory: |
No description provided.