Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For pm-cpu, upgrade Intel compiler to 2023.2.0 as well as other modules for Intel only #6596

Merged
merged 1 commit into from
Sep 19, 2024

Conversation

ndkeen
Copy link
Contributor

@ndkeen ndkeen commented Sep 7, 2024

For pm-cpu, move from intel/2023.1.0 to intel/2023.2.0.
Updating to this version allows us to also update several other module versions.
These are the updates to other modules we are doing at the same just for Intel compiler for now:

PrgEnv-intel/8.3.3                   PrgEnv-intel/8.5.0
craype/2.7.20                        craype/2.7.30                    
cray-mpich/8.1.25                    cray-mpich/8.1.28                
cray-hdf5-parallel/1.12.2.3          cray-hdf5-parallel/1.12.2.9      
cray-netcdf-hdf5parallel/4.9.0.3     cray-netcdf-hdf5parallel/4.9.0.9 
cray-parallel-netcdf/1.12.3.3        cray-parallel-netcdf/1.12.3.9

While this change does not address a known issue, and the versions are higher than machine defaults, this is in preparation for upcoming SW changes. Also do not expect any significant performance changes, but more testing warranted.

So far, testing shows the results are BFB, but would rather not assume the PR is BFB as it changes compiler version.

@ndkeen ndkeen self-assigned this Sep 7, 2024
@ndkeen ndkeen added pm-cpu Perlmutter at NERSC (CPU-only nodes) Machine Files labels Sep 7, 2024
Copy link

github-actions bot commented Sep 7, 2024

PR Preview Action v1.4.7
🚀 Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-6596/
on branch gh-pages at 2024-09-07 18:59 UTC

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Is there a reason not updating the versions for the other compilers? Planned for later?

@rljacob
Copy link
Member

rljacob commented Sep 9, 2024

Is this BFB ?

@ndkeen
Copy link
Contributor Author

ndkeen commented Sep 9, 2024

I ran e3sm_prod and compared with baselines. I also run e3m_integration with older version of repo -- both showed all tests passing. But I might not expect all cases to be BFB.

Should we:
a) verify all test are BFB with e3sm_integration again
b) look for more examples of cases -- perhaps current production-like cases on pm-cpu? And run some tests.
c) go ahead with this as-s where i'm only updating intel
d) try to fix issues i have updating gcc to latest version first? so that we can move all module versions up at same time.

@rljacob
Copy link
Member

rljacob commented Sep 10, 2024

Yes run the production suite with a comparison against the existing baselines.

@rljacob rljacob added this to the v3.0.1 milestone Sep 12, 2024
@ndkeen
Copy link
Contributor Author

ndkeen commented Sep 13, 2024

I just reminded myself that I had already done that -- running with e3sm_prod is BFB using this change.

ndkeen added a commit that referenced this pull request Sep 17, 2024
…t (PR #6596)

For pm-cpu, move from intel/2023.1.0 to intel/2023.2.0.
Updating to this version allows us to also update several other module versions.
These are the updates to other modules we are doing at the same just for Intel compiler for now:

PrgEnv-intel/8.3.3                   PrgEnv-intel/8.5.0
craype/2.7.20                        craype/2.7.30
cray-mpich/8.1.25                    cray-mpich/8.1.28
cray-hdf5-parallel/1.12.2.3          cray-hdf5-parallel/1.12.2.9
cray-netcdf-hdf5parallel/4.9.0.3     cray-netcdf-hdf5parallel/4.9.0.9
cray-parallel-netcdf/1.12.3.3        cray-parallel-netcdf/1.12.3.9

While this change does not address a known issue, and the versions are higher than machine defaults, this is in preparation for upcoming SW changes. Also do not expect any significant performance changes, but more testing warranted.

So far, testing shows the results are BFB, but would rather not assume the PR is BFB as it changes compiler version.
@ndkeen
Copy link
Contributor Author

ndkeen commented Sep 17, 2024

merged to next

@rljacob
Copy link
Member

rljacob commented Sep 19, 2024

I think you can merge this to master.

@ndkeen
Copy link
Contributor Author

ndkeen commented Sep 19, 2024

ok. yea I was trying to test more cases with it. In fact, I think a scream case was not BFB after this change. But we can cross that bridge when we merge.

@ndkeen
Copy link
Contributor Author

ndkeen commented Sep 19, 2024

OK, for a basic ne30pg2_ne30pg2.F2010-SCREAMv1 test case with scream repo (of Sep 10) on pm-cpu, I see this change is not BFB.

I can merge this as-is and continue testing?

@rljacob
Copy link
Member

rljacob commented Sep 19, 2024

I think so since probably no one is using pm-cpu for SCREAM v1 simulations. If they are, they're probably using the scream repo.

@ndkeen
Copy link
Contributor Author

ndkeen commented Sep 19, 2024

I tried a recent HR case piCtl.ne120pg2_r025_RRSwISC6to18E3r5 for 2 days and it's BFB after this change.

Will merge and we will have to remember to address non-bfb issues with scream repo.

@ndkeen ndkeen merged commit 3eb6333 into master Sep 19, 2024
3 checks passed
@ndkeen ndkeen deleted the ndk/machinefiles/pm-cpu-update-intel-compiler branch September 19, 2024 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Machine Files pm-cpu Perlmutter at NERSC (CPU-only nodes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants