-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHEL support #284
base: master
Are you sure you want to change the base?
RHEL support #284
Conversation
Thanks for this work @hmeiland ! What is the state of this PR, is there still something to be done? We need to build a rhel-9.2-hpc image, I guess your branch would be the best starting point for this? |
@@ -0,0 +1,3 @@ | |||
#!/bin/bash | |||
|
|||
$ALMA_COMMON_DIR/install_intel_libs.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you copied install_intel_libs.sh
in rhel/common (replacing the distribution names), but if I understand correctly this line means that it will use the version from the alma directory in the end?
@@ -0,0 +1,3 @@ | |||
#!/bin/bash | |||
|
|||
$ALMA_COMMON_DIR/install_nccl.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, won't this use the file from the alma directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a README as well?
set -ex | ||
|
||
case ${DISTRIBUTION} in | ||
"rhel8.6") NCCL_VERSION="2.14.3-1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we are using older NCCL and CUDA versions.
"rhel8.6") NCCL_VERSION="2.14.3-1"; | ||
CUDA_VERSION="11.6"; | ||
;; | ||
"rhel8.7") NCCL_VERSION="2.18.1-1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls switch to NCCL 2.19-3 - this has some significant performance fixes.
Also, pls switch to CUDA 12.2
#!/bin/bash | ||
set -ex | ||
|
||
case ${DISTRIBUTION} in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why we have to use older version for 8.6
#!/bin/bash | ||
set -ex | ||
|
||
case ${DISTRIBUTION} in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest supported driver - 535.86.10, CUDA - 12.2
set -ex | ||
|
||
# Install DCGM | ||
DCGM_VERSION=2.4.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest supported DCGM_VERSION=3.1.8
#!/bin/bash | ||
set -ex | ||
|
||
VERSION="5.8-1.0.1.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest supported VERSION="23.07-0.5.1.2"
|
||
INSTALL_PREFIX=/opt | ||
|
||
# HPC-X v2.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls update to 2.16 or higher - it has some significant fixes.
sed -i "$ s/$/ openmpi perftest/" /etc/dnf/dnf.conf | ||
|
||
# Intel MPI 2021 (Update 7) | ||
IMPI_2021_VERSION="2021.7.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls update to 2021 update 9 or higher.
Due to changing RHEL policies, more customers run directly on RHEL. These script allow them to create RHEL-hpc images themselves, that can be in sync with e.g. alma based images.