Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get SOCA vrfy job working on Hera again #1045

Merged
merged 5 commits into from
Apr 17, 2024
Merged

Conversation

CoryMartin-NOAA
Copy link
Contributor

What the title says. Fixes #994

@CoryMartin-NOAA CoryMartin-NOAA added hera-GW-RT Queue for automated testing with global-workflow on Hera orion-GW-RT Queue for automated testing with global-workflow on Orion labels Apr 16, 2024
@CoryMartin-NOAA CoryMartin-NOAA self-assigned this Apr 16, 2024
@emcbot emcbot added hera-GW-RT-Running Automated testing with global-workflow running on Hera orion-GW-RT-Running Automated testing with global-workflow running on Orion and removed hera-GW-RT Queue for automated testing with global-workflow on Hera orion-GW-RT Queue for automated testing with global-workflow on Orion labels Apr 16, 2024
@CoryMartin-NOAA
Copy link
Contributor Author

unit tests fail because EMC RZDM is down...

Copy link
Contributor

@guillaumevernieres guillaumevernieres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RussTreadon-NOAA
Copy link
Contributor

unit tests fail because EMC RZDM is down...

I am unable to build GDASApp from PR #1033. I no longer think PR #1033 is the issue. I'm beginning to suspect the failure is related to EMC RZDM being offline. Attempts to build GDASApp develop at several hashes fail as documented in PR #1033.

@CoryMartin-NOAA
Copy link
Contributor Author

I'm not sure what to do to fix this logjam. We have no idea when RZDM will be back. Do we disable all unit tests in the interim? Find an alternate place to store the files? Other options?

@emcbot
Copy link

emcbot commented Apr 16, 2024

Automated Global-Workflow GDASApp Testing Results:
Machine: orion

Start: Tue Apr 16 09:37:02 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build:                                  *FAILED*
Build: Failed at Tue Apr 16 09:57:11 CDT 2024
Build: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/log.build

@emcbot emcbot added orion-GW-RT-Failed Automated testing with global-workflow failed on Orion and removed orion-GW-RT-Running Automated testing with global-workflow running on Orion labels Apr 16, 2024
@emcbot
Copy link

emcbot commented Apr 16, 2024

Automated Global-Workflow GDASApp Testing Results:
Machine: hera

Start: Tue Apr 16 14:44:36 UTC 2024 on hfe11
---------------------------------------------------
Build:                                  *FAILED*
Build: Failed at Tue Apr 16 15:02:40 UTC 2024
Build: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/log.build

@emcbot emcbot added hera-GW-RT-Failed Automated testing with global-workflow failed on Hera and removed hera-GW-RT-Running Automated testing with global-workflow running on Hera labels Apr 16, 2024
@guillaumevernieres
Copy link
Contributor

I'm not sure what to do to fix this logjam. We have no idea when RZDM will be back. Do we disable all unit tests in the interim? Find an alternate place to store the files? Other options?

should we put the tarball on hpc instead?

@RussTreadon-NOAA
Copy link
Contributor

Why does our not being able to download test files from EMC RZDM break the GDASApp build? It should only break testing, not the build, right? Does cmake intertwine application testing with the application build in such a way that if we can't build the tests, we can't build GDASApp?

@RussTreadon-NOAA
Copy link
Contributor

Do we have tarball gdasapp-fix-${SHORTSHA}.tgz available from somewhere besides https://ftp.emc.ncep.noaa.gov/static_files/public/GDASApp?

@CoryMartin-NOAA
Copy link
Contributor Author

Why does our not being able to download test files from EMC RZDM break the GDASApp build? It should only break testing, not the build, right? Does cmake intertwine application testing with the application build in such a way that if we can't build the tests, we can't build GDASApp?

It's because CMake is doing the downloading. It fails, so CMake can't finish. Perhaps we should instead have a test that runs that downloads/links the files. This is how JCSDA does it for JEDI.

@RussTreadon-NOAA
Copy link
Contributor

Placing the tarball on HPC is an option but when the given machine is offline, we're stuck. We could place the tarball on multiple HPC platforms. Even this approach has potential pitfalls. For example, we place the tarball in /work2 but when we want to build on Orion or Hercules /work2 is offline for some reason and only /work is available.

@RussTreadon-NOAA
Copy link
Contributor

Why does our not being able to download test files from EMC RZDM break the GDASApp build? It should only break testing, not the build, right? Does cmake intertwine application testing with the application build in such a way that if we can't build the tests, we can't build GDASApp?

It's because CMake is doing the downloading. It fails, so CMake can't finish. Perhaps we should instead have a test that runs that downloads/links the files. This is how JCSDA does it for JEDI.

This is a good option. It separates the application build from the application tests. We should be able to compile GDASApp whether or not data for ctests is available.

@guillaumevernieres guillaumevernieres added hera-GW-RT Queue for automated testing with global-workflow on Hera and removed hera-GW-RT-Failed Automated testing with global-workflow failed on Hera orion-GW-RT-Failed Automated testing with global-workflow failed on Orion labels Apr 16, 2024
@guillaumevernieres guillaumevernieres added the orion-GW-RT Queue for automated testing with global-workflow on Orion label Apr 16, 2024
@emcbot emcbot added hera-GW-RT-Running Automated testing with global-workflow running on Hera orion-GW-RT-Running Automated testing with global-workflow running on Orion and removed hera-GW-RT Queue for automated testing with global-workflow on Hera orion-GW-RT Queue for automated testing with global-workflow on Orion labels Apr 16, 2024
@emcbot
Copy link

emcbot commented Apr 17, 2024

Automated Global-Workflow GDASApp Testing Results:
Machine: hera

Start: Wed Apr 17 00:36:44 UTC 2024 on hfe11
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Wed Apr 17 00:37:30 UTC 2024
Tests: 
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

@emcbot emcbot added hera-GW-RT-Failed Automated testing with global-workflow failed on Hera and removed hera-GW-RT-Running Automated testing with global-workflow running on Hera labels Apr 17, 2024
@emcbot
Copy link

emcbot commented Apr 17, 2024

Automated Global-Workflow GDASApp Testing Results:
Machine: orion

Start: Tue Apr 16 19:46:56 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Tue Apr 16 19:47:20 CDT 2024
Tests: 
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

@emcbot emcbot added orion-GW-RT-Failed Automated testing with global-workflow failed on Orion and removed orion-GW-RT-Running Automated testing with global-workflow running on Orion labels Apr 17, 2024
@emcbot
Copy link

emcbot commented Apr 17, 2024

Automated Global-Workflow GDASApp Testing Results:
Machine: hera

Start: Wed Apr 17 00:36:44 UTC 2024 on hfe11
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Wed Apr 17 00:37:30 UTC 2024
Tests: 
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

Build: SUCCESS
Build: Completed at Wed Apr 17 01:23:34 UTC 2024

Tests: Failed
Tests: Failed at Wed Apr 17 01:39:40 UTC 2024
Tests: 91% tests passed, 5 tests failed out of 54
1763 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
1776 - test_gdasapp_atm_jjob_var_run (Failed)
1777 - test_gdasapp_atm_jjob_var_inc (Failed)
1778 - test_gdasapp_atm_jjob_var_final (Failed)
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

@emcbot emcbot added the hera-GW-RT-Running Automated testing with global-workflow running on Hera label Apr 17, 2024
@emcbot
Copy link

emcbot commented Apr 17, 2024

Automated Global-Workflow GDASApp Testing Results:
Machine: orion

Start: Tue Apr 16 19:46:56 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Tue Apr 16 19:47:20 CDT 2024
Tests: 
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

Build: SUCCESS
Build: Completed at Tue Apr 16 20:42:38 CDT 2024

Tests: Failed
Tests: Failed at Tue Apr 16 21:02:46 CDT 2024
Tests: 93% tests passed, 4 tests failed out of 54
1777 - test_gdasapp_atm_jjob_var_run (Failed)
1778 - test_gdasapp_atm_jjob_var_inc (Failed)
1779 - test_gdasapp_atm_jjob_var_final (Failed)
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

@emcbot emcbot added the orion-GW-RT-Running Automated testing with global-workflow running on Orion label Apr 17, 2024
@emcbot
Copy link

emcbot commented Apr 17, 2024

Automated Global-Workflow GDASApp Testing Results:
Machine: hera

Start: Wed Apr 17 01:42:32 UTC 2024 on hfe06
---------------------------------------------------
Build:                                 *SUCCESS*
Build: Completed at Wed Apr 17 02:33:04 UTC 2024
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Wed Apr 17 02:49:26 UTC 2024
Tests: 91% tests passed, 5 tests failed out of 54
	1763 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
	1776 - test_gdasapp_atm_jjob_var_run (Failed)
	1777 - test_gdasapp_atm_jjob_var_inc (Failed)
	1778 - test_gdasapp_atm_jjob_var_final (Failed)
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

@emcbot emcbot removed the hera-GW-RT-Running Automated testing with global-workflow running on Hera label Apr 17, 2024
@emcbot
Copy link

emcbot commented Apr 17, 2024

Automated Global-Workflow GDASApp Testing Results:
Machine: orion

Start: Tue Apr 16 21:03:53 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build:                                 *SUCCESS*
Build: Completed at Tue Apr 16 21:58:27 CDT 2024
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Tue Apr 16 22:30:14 CDT 2024
Tests: 91% tests passed, 5 tests failed out of 54
	1764 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
	1777 - test_gdasapp_atm_jjob_var_run (Failed)
	1778 - test_gdasapp_atm_jjob_var_inc (Failed)
	1779 - test_gdasapp_atm_jjob_var_final (Failed)
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

@emcbot emcbot removed the orion-GW-RT-Running Automated testing with global-workflow running on Orion label Apr 17, 2024
Copy link
Contributor

@guillaumevernieres guillaumevernieres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_gdasapp_util_ghrsst2ioda job is failing too but isn't reported as failed. Can you update the references @apchoiCMD ?
The vrfy task is failing with an issue reading the layer thicknesses, not sure why ...

This PR fixes the previous error, good enough to be merged. We'll comment out the verify test until one of us water people have time to look into it and fix the issue.

@apchoiCMD
Copy link
Collaborator

The test_gdasapp_util_ghrsst2ioda job is failing too but isn't reported as failed. Can you update the references @apchoiCMD ? The vrfy task is failing with an issue reading the layer thicknesses, not sure why ...

This PR fixes the previous error, good enough to be merged. We'll comment out the verify test until one of us water people have time to look into it and fix the issue.

My bad but your PR #1050 already includes a modified test reference file https://github.com/NOAA-EMC/GDASApp/pull/1050/files#diff-68603b0771f7acb935fa0d121599ea74718ba8012d0e7dc7c1ee1a192150d93e Do you want me to update before merging your PR? Thanks @guillaumevernieres

I fixed it!!!!
@guillaumevernieres guillaumevernieres marked this pull request as ready for review April 17, 2024 12:41
Copy link
Contributor

@guillaumevernieres guillaumevernieres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@guillaumevernieres guillaumevernieres merged commit 1389383 into develop Apr 17, 2024
4 of 5 checks passed
@guillaumevernieres guillaumevernieres deleted the bugfix/soca_vrfy branch April 17, 2024 12:42
danholdaway added a commit that referenced this pull request Apr 17, 2024
* upstream/develop:
  Get SOCA vrfy job working on Hera again (#1045)
  Get test data from a staged location on supported HPC Part Deux (#1052)
  Prepare observations for snow DA updates to the ensemble members (#998)
  Save basic stats in csv at each cycle (#1040)
  Fix bug for datetime in GHRSST Ioda Converter (#1027)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hera-GW-RT-Failed Automated testing with global-workflow failed on Hera orion-GW-RT-Failed Automated testing with global-workflow failed on Orion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY fails on Hera Rocky 8
5 participants