Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort if registering a restart too late in init()? #1214

Closed
mnlevy1981 opened this issue Sep 23, 2020 · 2 comments · Fixed by #1441
Closed

Abort if registering a restart too late in init()? #1214

mnlevy1981 opened this issue Sep 23, 2020 · 2 comments · Fixed by #1441
Assignees

Comments

@mnlevy1981
Copy link
Contributor

I'm working on the MARBL driver based off of the NCAR fork, and I ran into something unexpected with register_restart_field(). Basically, we use an iterative solver to compute PH, with an initial guess of PH from the previous time step - to ensure bit-for-bit comparisons when restarting, we need these initial guesses in the restart file. I naively called register_restart_field() from a subroutine called by tracer_flow_control_init() and the fields were dutifully being written to the restart file but not being read in upon restarting. I moved the calls to a subroutine called from call_tracer_register() and everything worked as expected.

So it appears to me that the restart file is read in somewhere between the call to call_tracer_register() and the call to tracer_flow_control_init() -- if that's the case, it would be really nice to get an explicit error message when trying to register a new restart field after the restarts have already been processed. I didn't see anything in the log files to indicate I had done anything wrong, so it was dumb luck that I stumbled on the solution on my own.

@marshallward
Copy link
Collaborator

marshallward commented Sep 25, 2020

Hi Michael,

I discussed with others here and we agree that there should in principle be no reason to call register_restart_field after restore_state has been called. So it may be possible to introduce a "lock" on restart registration and raising an error.

Having said that, there are two minor and one major issue with doing this right now.

The first minor issue is that we do actually call register_restart_field after restore_state as part of the indirect method for writing the initial state. More on this later.

The second minor issue is that restore_state would not be called on an initialization, so we would also need to find a way to introduce this lock without actually calling restore_state.

I don't think either of these instances would be difficult to resolve.

But there is a larger issue, namely that we are preparing for a major overhaul of the FMS I/O, which includes a new handling of restart files (#1165). Any work that we here would likely cause a conflict with this PR. One of the sections heavily impacted by this PR is the initial state output.

There is also the possibility that this ought to be handled at the FMS level, which we can also look into.

Due to the ongoing FMS update, I'm going to recommend that we hold off on this and leave this issue open until the FMS PR has been merged. Does that work for you?

@mnlevy1981
Copy link
Contributor Author

Due to the ongoing FMS update, I'm going to recommend that we hold off on this and leave this issue open until the FMS PR has been merged. Does that work for you?

That sounds reasonable to me -- I don't think this is an urgent issue, but did want to open a ticket in case others run into similar problems with restart files seemingly not being read.

@Hallberg-NOAA Hallberg-NOAA self-assigned this Apr 26, 2021
Hallberg-NOAA added a commit to Hallberg-NOAA/MOM6 that referenced this issue Jul 8, 2021
  Added code to lock the restart registry once all registration should have
occurred or if the restart has been read, along with a new public interface,
restart_registry_lock, to allow this lock to be set or unset.  All calls to
register restart fields now check the state of this lock and issue a fatal error
if the registry is locked.  This PR addresses MOM6 issue mom-ocean#1214.

  In the process of adding this restart lock, the new error messages revealed
that some of the restart registration calls related to some types of open
boundary conditions were not happening early enough.  To avoid this, a new
interface, register_DOME_OBC, was added to the DOME_initialization module and is
being called from call_OBC_register, and a number of the OBC-related calls
during the initialization were collected in the same (appropriate) place.  Some
OBC error messages were also corrected.  All answers are bitwise identical, but
there are two new public interfaces and the order of some OBC-related entries in
the MOM_parameter_doc calls changed.
@Hallberg-NOAA Hallberg-NOAA linked a pull request Jul 9, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants