Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHEL support #284

Open
wants to merge 69 commits into
base: master
Choose a base branch
from
Open

RHEL support #284

wants to merge 69 commits into from

Conversation

hmeiland
Copy link
Contributor

Due to changing RHEL policies, more customers run directly on RHEL. These script allow them to create RHEL-hpc images themselves, that can be in sync with e.g. alma based images.

@ltalirz
Copy link
Contributor

ltalirz commented Jan 10, 2024

Thanks for this work @hmeiland ! What is the state of this PR, is there still something to be done?

We need to build a rhel-9.2-hpc image, I guess your branch would be the best starting point for this?

@ltalirz ltalirz mentioned this pull request Jan 10, 2024
@@ -0,0 +1,3 @@
#!/bin/bash

$ALMA_COMMON_DIR/install_intel_libs.sh
Copy link
Contributor

@ltalirz ltalirz Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you copied install_intel_libs.sh in rhel/common (replacing the distribution names), but if I understand correctly this line means that it will use the version from the alma directory in the end?

@@ -0,0 +1,3 @@
#!/bin/bash

$ALMA_COMMON_DIR/install_nccl.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, won't this use the file from the alma directory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a README as well?

set -ex

case ${DISTRIBUTION} in
"rhel8.6") NCCL_VERSION="2.14.3-1";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we are using older NCCL and CUDA versions.

"rhel8.6") NCCL_VERSION="2.14.3-1";
CUDA_VERSION="11.6";
;;
"rhel8.7") NCCL_VERSION="2.18.1-1";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls switch to NCCL 2.19-3 - this has some significant performance fixes.
Also, pls switch to CUDA 12.2

#!/bin/bash
set -ex

case ${DISTRIBUTION} in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why we have to use older version for 8.6

#!/bin/bash
set -ex

case ${DISTRIBUTION} in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest supported driver - 535.86.10, CUDA - 12.2

set -ex

# Install DCGM
DCGM_VERSION=2.4.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest supported DCGM_VERSION=3.1.8

#!/bin/bash
set -ex

VERSION="5.8-1.0.1.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest supported VERSION="23.07-0.5.1.2"


INSTALL_PREFIX=/opt

# HPC-X v2.14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls update to 2.16 or higher - it has some significant fixes.

sed -i "$ s/$/ openmpi perftest/" /etc/dnf/dnf.conf

# Intel MPI 2021 (Update 7)
IMPI_2021_VERSION="2021.7.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls update to 2021 update 9 or higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants