-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor supported input path handling. #653
Refactor supported input path handling. #653
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a bunch of (hopefully) clarifying comments for why I made the choices I made in the refactor, often citing the best-practice rule I applied. Most commonly, the changes here relate to style, reducing duplication, and reducing coupling to other system components.
Several items are meant only for staging the next phase of this work, but should still work as expected here.
Could any of the reviewers please review this PR? |
This PR conflicts with two older PRs: #580 and #650. PR #650 is under current review, and PR #580 has been reviewed, but is waiting on comments/changes. @gsketefian, any chance you might have time to get these two PRs squared away next week? |
@JeffBeck-NOAA Yes. I just replied to @christinaholtNOAA 's comments on PR #650. Next, I will have to compare this PR and #580 to see how they can be reconciled. |
PR #580 hasn't been touched for at least 3 months. That one should be closed and considered obsolete given its state of abandon. I understand that PR #650 may conflict with this implementation, but if it gets merged first, it will be up to me to reconcile those differences. A review of this PR could potentially negate the need for the other two. It also severely holds up the work that needs to come next to remove machine dependency from scripts -- work that is needed for us to be able to run the workflow on generic cloud platforms for automated testing. |
No, it shouldn't be closed, nor is it obsolete or abandoned. @gsketefian has let other regional_workflow admins know that he has been unable to finalize the PR due to other more pressing priorities, but that he intends to review comments and complete changes soon.
PRs are reviewed in the order they are opened. Older PRs take precedence when there is a conflict with a newer PR and code contributors must merge in updates into their forks/PRs once these older PRs have been reviewed and merged. I have been in contact with @gsketefian about PR #650, which has been reviewed and had changes made, so it is close to being merged. Code contributors with conflicting capabilities need to work together to resolve opposing solutions in parallel PRs.
My understanding was that PR #650 was opened to address this specific issue, generalizing variable templating for use across all systems, including cloud platforms. Again, if this PR introduces an alternative but conflicting method, that needs to be resolved directly with @gsketefian first. Then, one or two PRs can be reviewed which do not contain opposing solutions. |
@JeffBeck-NOAA - The processes you described for reviewing of PRs, and the way they are being managed is unacceptable and untenable. #580 is absolutely abandoned and obsolete by any sane measure of software development practices. We cannot continue development in this fashion; it isn't sustainable and it is putting very real barriers in the way of progress and causing material harm to the overall effort. There is absolutely zero reason why PRs should always be considered only in the order in which they are received, particularly when overcome by events and new development that supersedes earlier work. And even more so, when the PR author cannot address concerns for 3+ months. We can't afford to continue development in this manner as it is costing us months of lost productivity. Old work should be shelved and focus needs to shift to current efforts. If the old work really is important and has value, it can be resubmitted in the context of current state of the repo once the author is able to commit to seeing their work all the way through the process. |
I've reached out to ask about the status of PR #580 and have heard no status updates, and have seen any movement on the code. I agree that we need to work together, but we cannot be held up for months on developments that are needed now. A different solution was proposed and the work that was done before has to either be merged so that I can work with that code, or it has to be abandon for the newer solution. I was not arguing that #650 should or should not get merged first. I need there to be a review of my code. The PR that is ready first should get merged first, and the PR that gets merged second is the one that has to deal with conflicts. It's not exactly appropriate that nobody looks at my PR until another one that could potentially be in conflict is completely merged. It is by default putting me in a position to wait for months, given the typical time it takes to merge. PR #650 was opened without any coordination after I mentioned in PR #617 that templates are needed and that I was working on templates:
|
@christopherwharrop-noaa, I should clarify my previous comments, as I wasn't completely clear. Generally, older PRs should be cleared from the backlog first. However, newer PRs can and should be reviewed immediately, so long as they don't conflict with older PRs, and that is the problem with this set of PRs. I stated that the author of PR #580 has told the other code managers he will be getting back to his PR very soon, so it is not abandoned or obsolete. He has specifically asked for it not to be closed and wishes for it to be merged. I agree that we can't wait forever, though, and we have expressed this to the code contributor. We also can't hold up new development with older PRs, and I agree that we shouldn't avoid reviewing newer PRs while older PRs have yet to be merged. That is not what I was suggesting. If a new PR from a new developer is opened that conflicts directly with an older PR, that developer should work with the other to resolve the conflict. That is what we're dealing with here, and I'm trying to make sure that both developers are able to contribute to the code base effectively. Something should be said for using forks of authoritative repositories for urgent, developmental needs that have immediate deadlines. That is what we have done with the SRW App related to UFS-R2O and WPO projects, and it's worked quite well. Once those deliverables have been met, new features and functionality can be contributed back to the authoritative repositories in regular intervals, which is what we've done. But again, regardless of repo, I agree that we need to increase the speed with which we review PRs and get things merged. @christinaholtNOAA, between PRs #580, #617, and #650, it sounds like you and @gsketefian need to meet to discuss how to proceed. It may be best to close one or more PRs and open a new one that resolves the differences between all three. |
@JeffBeck-NOAA In regards to your comments on forks, this is not appropriate for the work I am proposing here. The eventual goal is to remove machine dependence for the authoritative repository so that we may be able to run automated tests of the develop branch on the cloud. In an effort to NOT open a gargantuan PR for review that would take months to review and modify, I am opening a series of PRs that take care of small chunks along the way. This is the first of several, all of which belong in develop. We've had many discussions on the use of forks to hold code that is needed immediately by a group of developers because of our need for quick turnaround and potentially to hold prototype work that would break develop. That is not what this work is. As a code manager, suggesting that developers do not contribute code immediately to develop is inappropriate. Putting new developments on the develop branch in small, reviewable chunks is the gold standard for how all of this works, and should be upheld by repository managers. Holding new developments in forks for extended periods does not work well. Code diverges and a huge amount of technical debt must eventually be repaid as large changes are made to develop that touch hundreds of files. It is of vital importance that code that meets the needs of the develop branch should be included as quickly as possible. |
@JeffBeck-NOAA - What you are saying doesn't comport with reality. #580 hasn't been touched in 4 months. And three months has passed since the author last apologized for not responding to requests for changes. In practical terms, we have already waited "forever", and yes, it is abandoned. The author of that PR has already failed to execute their responsibilities as a code contributor and failed to commit to seeing the PR through to its conclusion in a timely manner. At this point, it is unreasonable to trust that the work will be completed any time soon. New work carried out in good faith should not be held hostage by ancient work that is incomplete and totally inactive, regardless of what the author of the old work wants. The process you suggested of resolving conflicts between two outstanding PRs is also fundamentally misguided and in direct conflict with standard development processes. That is not how an efficient or effective development process works. Professional software developers do not do that. It is absolutely NOT the responsibility of a PR author to make sure their PR is not in conflict with other outstanding PRs. It is the responsibility of the PR author to ensure their work doesn't conflict with the target branch to which their PR is proposed to be merged. If someone else's PR gets their first, then THAT is when the conflicts are dealt with, by the author of the PR. The process you are suggesting is unusable and will absolutely not result in effective contributions. A person that hasn't touched their PR in 4 months is not contributing effectively. I also take issue with your comments about authoritative forks and I do not agree that they have worked out well after having been involved with cloud work for RRFS for the past year. The lack of synchronization between them and the way they have been mismanaged by holding development for too long before merges has caused significant harm to projects seeking to use features that exist in multiple places. The forks being simultaneously ahead/behind has been the bane of other projects for over a year now. The model you are describing is actually costing productivity rather than conserving it. I don't understand the death grip on a 4 month old PR that the author is clearly not committed to. It's time to move on. |
In that case, I agree, small PRs into the authoritative develop is the best way to approach those contributions.
I am absolutely NOT suggesting that developers do not contribute code immediately to develop. I am suggesting that they do development in their forks for an immediate deliverable, then as soon as possible, they open a PR for a new feature in the authoritative PR where code review isn't tied to an urgent deadline. I now understand that this PR is different, since it involves work specific to the authoritative develop branch.
I agree. Code shouldn't be held in forks for a long period of time, but for urgent deliverables (that are not specifically related to the authoritative develop branch), forks work well, SO LONG AS developers merge in what they've worked on into the authoritative repos at regular intervals to avoid code divergence. That is an absolute necessity. |
@christopherwharrop-noaa, I am attempting to get two code contributors to work together to resolve conflicts between their PRs. If I had not had recent, active discussion with the author of the older PR, I would have agreed with you to close the older PR and move forward with the newer one. I completely agree that the older PR has been in queue for way too long and the situation needs to be resolved immediately.
I agree that a new paradigm needs to be developed for how to manage authoritative forks to ensure that features are regularly committed back to the authoritative repos and that updates in the authoritative repos are merged into the forks. We need to continue the discussion related to institutional forks to make sure we can resolve these issues for projects going forward. |
@christinaholtNOAA, thank you for scheduling a meeting for Monday to discuss the PR situation. After talking with @gsketefian, I believe we're ready to move forward with closing #580 and combining this one, #650, and anything from #580 that is worth keeping. @gsketefian, please feel free to clarify here. We don't want to hold up new development, particularly when it is crucial for project deadlines, so I'm hoping we can resolve everything soon. |
Revealing myself as the mysterious author of PR #580, I will now make my point of view known:
To avoid repeating the heated discussion above, we (obviously) need an official (i.e. written up in a document) repo management strategy -- a set of rules to point to in case disagreements arise. I think this is a deliverable for the DTC project, but I don't believe such a document exists yet. This would be something we can discuss in the meeting on Monday. As part of such a strategy, we need an automated PR testing capability to run a user-specified subset of the WE2E tests (or, once in a while, all of them) on a user-specified set of platforms that creates a report that includes the hashes of the subrepos being tested, the list of tests that were run, the platforms on which they were run, and of course success/failure status of each test/platform combo. Without this, we are finding it difficult to update the subrepo hashes (in Externals.cfg). Maybe the DDRF work that @christinaholtNOAA et al are doing does a lot of this. Including others in the DTC team so they are aware of this discussion. @jwolff-ncar @JeffBeck-NOAA @mkavulich @willmayfield |
ush/get_extrn_mdl_file_dir_info.sh
Outdated
;; | ||
"NAM") | ||
fns=( "" ) | ||
fns_in_arcv=( "nam.t${hh}z.bgrdsfi${hh}.tm00" ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested out NAM in a long time so I don't know if the original file is even correct, but to keep the same behavior, should tm00
be changed to tm${hh}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the tm00
stamp is that it's meant to indicate "t-minus 00 hours" from the analysis time for DA purposes. I have checked a sample file from NAM on HPSS and the bgrdsf
files available there follow a convention consistent with that understanding.
HPSS File: /NCEPPROD/hpssprod/runhistory/rh2021/202112/20211218/com_nam_prod_nam.2021121812.bgrid.tar
NAM files inside match this naming convention:
nam.t12z.bgrdsf00.tm0[0-6]
nam.t12z.bgrdsf${fhr}.tm00
Note that the i
that appears in both the original and new template is an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also found an additional error in both the old template and new, the fcst_hr in the file was set to ${hh}
, so would have pulled the wrong file for each forecast hour not equal to the cycle time hour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think users would generally want tm00 for initialization purposes, but I could see where it might be useful to allow for varying tm values, although that would require coding up more logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this PR is to refactor a section of code. I'd prefer not to add functionality, especially for a feature whose need is speculative in nature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not for this PR, just thinking ahead for future contributions to the App.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments and requested changes
;& # Fall through. RAP and HRRR follow same naming rules | ||
|
||
"HRRR") | ||
fns_in_arcv=( "${yy}${ddd}${hh}${mn}${fcst_hh}${fcst_mn}" ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the App can not run with current RAP/HRRR data? If that's the case we should probably open an issue.
@mkavulich Thanks for the review. I'm fairly confident that the current app can't pull data from known data streams for more recent cases using RAP/HRRR ICS/LBCS because it was using the RAP/HRRR experimental runs that were running at GSL before the last operational implementation. Once those systems were upgraded, the experimental versions were turned off. Now, the operational streams are the only ones that exist. I did not make that change in this PR, but I have updated the data streams to be consistent with the available data streams in the machine-specific configuration PR I have waiting in the wings. |
Yes, the HRRRX and RAPX data were being pulled in the App since they were, at the time, the latest versions of both, prior to the final implementations. There's an issue open to add paths to the operational versions, which we have wanted to include as an option to download from HPSS. Note that the older experimental RAP and HRRR data are still available on HPSS, so the pull scripts were still working. I think we want to go to only pulling operational versions from HPSS from now on, though. |
@christinaholtNOAA This PR looks good to me. I suggest running a few more tests that will exercise some of the modified portions of the code. Since we have tests for getting external files from HPSS and that code was changed quite a bit, I suggest running the following WE2E tests:
I think it is sufficient if the |
@mkavulich Are you happy with the changes I made, and the responses to your questions? I can rerun a larger test suite today, including those that include pulling from HPSS. |
@christinaholtNOAA Thanks for testing. I will approve. We can merge when @mkavulich gives the ok. |
When passed an empty string, the date command was incrementing the date by 1 hour.
The previous logic cut the script off too soon and didn't let it try to check HPSS.
I ran a big test suite again. The following tests passed:
I also included these to test pulling from HPSS:
I am ready for a merge when @mkavulich clears it. |
DESCRIPTION OF CHANGES:
This PR is one of two in a series that addresses refactoring platform-dependence. The topic of this PR is to refactor the logic for supporting platforms with specific real-time data feeds. Here is a list of features and modifications that were made:
No changes were made to the logic for input managed by USE_USER_STAGED_EXTRN_FILES or COMINGgfs for NCO mode, although it could make sense to re-assess the NCO mode handling at a later date.
I plan to go through and "review" the code to lead reviewers through this one since it bit of change. It may be helpful to view it using GitHub's ignore whitespace feature.
TESTS CONDUCTED:
A single test case using the WE2E test on Hera:
A test case for the same forecast configuration using known paths on Hera to exercise the new code.
I checked that arrays were consistent, that the script exits in a sane manner when files are not available on disk or HPSS, and that I haven't broken anything with the way files are handled through the "user specified" mechanism necessary for the test framework.
ISSUE (optional):
This work is an incremental change in support of Issue #618
CONTRIBUTORS (optional):
@christopherwharrop-noaa @venitahagerty @robgonzalezpita