Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Example CI + Fixes to some broken example! #525

Draft
wants to merge 45 commits into
base: master
Choose a base branch
from

Conversation

okBrian
Copy link
Contributor

@okBrian okBrian commented Jul 17, 2024

Description

Added New Example CI, - Average runtime - 30 minutes

Why MacOS runner?
From my initial testing its much faster than the ubuntu images. Some examples ran over two times faster on the MacOS runner

Why 15625, (125, 125), or (25, 25, 25) for cell boundaries?
Some 3D examples break if m < 25, and this seems like a good spot to put examples so they don't take too long

Fixes #474

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Scope

  • This PR comprises a set of related changes with a common goal

How Has This Been Tested?

  • Github Actions Run all example with CI
  • Run all examples locally

Test Configuration:
gcc 14, Ubuntu 22,04.4 LTS
Github Action Macos Runner

Checklist

  • I have added comments for the new code
  • I ran ./mfc.sh format before committing my code
  • This PR does not introduce any repeated code (it follows the DRY principle)
  • I cannot think of a way to condense this code and reduce any introduced additional line count

@okBrian
Copy link
Contributor Author

okBrian commented Jul 17, 2024

I'm not exactly sure whats wrong with the test suite, but I think if I add the --generate flag they should be okay. Do we want the test suite to even test for examples because that would add another 30 minutes or more to them. Also I need to remove the -r flag from the example suites but it looks like it worked fine.

@sbryngelson
Copy link
Member

@okBrian we can add this as a separate GitHub workflow (Examples-test.yml, Example Smoke Test).

@sbryngelson
Copy link
Member

the -r --generate things I don't understand at the moment

@sbryngelson
Copy link
Member

@okBrian It's important that these cases run and produce output that isn't nonsense. So you can run the "example suite" with ./mfc.sh test —a (+ your flags, I guess?), where -a already exists. ' This runs the files through post_process and then checks for NaNs/Infinity.

If parallel_IO is not true then I'm not sure if -a works or not. You may need to turn it on via your toolchain tricks.

@henryleberre
Copy link
Member

I'm not exactly sure whats wrong with the test suite, but I think if I add the --generate flag they should be okay. Do we want the test suite to even test for examples because that would add another 30 minutes or more to them. Also I need to remove the -r flag from the example suites but it looks like it worked fine.

If you specify --generate it will compare the pre-process and simulation results against itself (not with the reference values from this PR). So you shouldn't pass --generate.

Whether we should or should not use the -r (--relentless) flag is debatable but we have currently opted not to use it so we should probably keep it that way. The main advantage is that we don't spend CI resources running more tests if one already failed and the workflow is marked as failed right away when the first test case fails.

@@ -180,7 +181,7 @@ def _handle_case(case: TestCase, devices: typing.Set[int]):
raise MFCException(f"Test {case}: {msg}")

if ARG("test_all"):
case.delete_output()
# case.delete_output()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to put this back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks the 1D bubble cases. Not sure why maybe its deleting the D folder I'm creating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps associated with current bug in CI that if it has to 'retry' a simulation, it will try to create a directory that already exists

@okBrian
Copy link
Contributor Author

okBrian commented Jul 27, 2024

I am expecting the code to not work with --no-mpi so thats another thing to debug... I think everything with mpi should pass hopefully.

@okBrian okBrian marked this pull request as draft July 28, 2024 01:48
@sbryngelson
Copy link
Member

this won't work on Phoenix until @henryleberre merges #581 and you pull in the module changes for Phoenix (we moved to RHEL9 nodes).

@okBrian
Copy link
Contributor Author

okBrian commented Aug 22, 2024

Oh my bad I thought they were done based on the most recent commit.

@henryleberre
Copy link
Member

Well, they should be working but there's an issue with @sbryngelson's Python installation. I'm pretty sure it's because he deleted bits from his ~/.local folder. The CI is complaining that a "pip" executable cannot be found in ~/.local/bin/pip. @sbryngelson I put my .local in /storage/coda1/p-sbryngelson3/0/shared/local-for-spencer so you can copy what's missing.

@sbryngelson
Copy link
Member

Well, they should be working but there's an issue with @sbryngelson's Python installation. I'm pretty sure it's because he deleted bits from his ~/.local folder. The CI is complaining that a "pip" executable cannot be found in ~/.local/bin/pip. @sbryngelson I put my .local in /storage/coda1/p-sbryngelson3/0/shared/local-for-spencer so you can copy what's missing.

I actually never deleted anything in my .local directory, just suggested you try it as a debug measure.

@sbryngelson
Copy link
Member

sbryngelson commented Aug 22, 2024

You can see that this works by looking here: https://github.com/MFlowCode/MFC/actions/runs/10498547818/job/29083688602

@henryleberre also opened this empty PR to test things #586

@okBrian
Copy link
Contributor Author

okBrian commented Aug 22, 2024

Hey is GT Phoenix fully migrated or should I keep waiting?

@sbryngelson
Copy link
Member

It's not working yet, though the Docker should be?

@henryleberre
Copy link
Member

@okBrian It should be working now ™. You should just need to rebase on upstream.

@sbryngelson
Copy link
Member

@okBrian you can sync with master now, it should work

@sbryngelson
Copy link
Member

Phoenix is down so none of the Phoenix CI will pass for now... will reboot those when they come back online

@sbryngelson
Copy link
Member

@okBrian Phoenix CI is back online, but now you will need to rebase as @henryleberre made a change for Frontier CI as well as the packer golden files.

@henryleberre
Copy link
Member

Also, @okBrian you need to regenerate the golden files for your new cases. Alternatively, I can run a script for you that strips the <x> <y> <z> entries for you. It's easiest if you regenerate.

@sbryngelson
Copy link
Member

@okBrian phoenix runners are back up, though this branch needs to be synced with master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Add examples/ into CI
3 participants