Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between plots generated in METviewer via R vs Python for MODE #395

Open
23 tasks
bikegeek opened this issue Jan 24, 2024 · 10 comments
Open
23 tasks
Assignees
Labels
alert: NEED ACCOUNT KEY Need to assign an account key to this issue component: plot mode data priority: high High Priority requestor: Community General Community requestor: NCAR National Center for Atmospheric Research requestor: NOAA/other NOAA Laboratory, not otherwise specified type: bug Fix something that is not working

Comments

@bikegeek
Copy link
Collaborator

bikegeek commented Jan 24, 2024

Replace italics below with details for this issue.

Describe the Problem

Discovered by Michelle while generating MODE plots in METviewer, and verified by John HG see details below.

Michelle (and METplus-Analysis team),

I'm confident that you've found a bug somewhere in the METplus Analysis suite.

It can be demonstrated on this instance of METviewer:
http://dtcenter.ucar.edu/met/metviewer/servlet

The attached files show running the exact same XML configuration, one using the Python logic and the other using R. They both produce essentially the same "R data" file. Although for some reason the "fcst_valid" column contains the current timestamp when the plot was run... and since the plots were generated at different times, those values don't match.

However there's a huge difference in the resulting png plots. The R version sums the data up across multiple initialization timestamps while the Python version plots only one of the values.

For example, for the red line "nostoch - fcst" at the 36 hour lead time, I see 12 lines of data (see below).

The R plot appears to plot their sum of 161 while the Python plot appears to plot a value of 14, which happens to be the 2nd to last value in the list below.

So there appears to be a big discrepancy in the legacy R plotting logic and the new Python plotting logic for counts of MODE objects. There's also a question as to why the current timestamp is written to the fcst_valid column... that's not necessarily wrong... just want to double-check that it's the expected behavior.

grep nostoch plot_icperts_mode_apcp_obj_test-fixed-init-Python.data | grep "360000" | grep FSA

nostoch 2022-04-30 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 15

nostoch 2022-05-02 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 16

nostoch 2022-05-03 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 16

nostoch 2022-05-04 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 9

nostoch 2022-05-05 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 14

nostoch 2022-05-06 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 15

nostoch 2022-05-07 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 15

nostoch 2022-05-08 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 13

nostoch 2022-05-09 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 6

nostoch 2022-05-10 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 7

nostoch 2022-05-11 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 14

nostoch 2022-05-12 00:00:00 2024-01-24 17:16:43 360000 APCP_01_ENS_NEP_ge2.54_NBRHD729 CNTSUM_FSA 21

plot_icperts_mode_apcp_obj_test-fixed-init-Python

plot_icperts_mode_apcp_obj_test-fixed-init-R.data.txt
plot_icperts_mode_apcp_obj_test-fixed-init-R

plot_icperts_mode_apcp_obj_test-fixed-init-Python.data.txt

plot_icperts_mode_apcp_obj_test-fixed-init-Python.xml.txt

plot_icperts_mode_apcp_obj_test-fixed-init-R.xml.txt

Expected Behavior

MODE Plots generated in METviewer using BOTH legacy R script and Python should have identical results (consistent times plotted, etc.)

Environment

Describe your runtime environment:
1. Machine: (e.g. HPC name, Linux Workstation, Mac Laptop)
2. OS: (e.g. RedHat Linux, MacOS)
3. Software version number(s)

To Reproduce

Describe the steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
Post relevant sample data following these instructions:
https://dtcenter.org/community-code/model-evaluation-tools-met/met-help-desk#ftp

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Organization level Project for support of the current coordinated release
  • Select Repository level Project for development toward the next official release or add alert: NEED CYCLE ASSIGNMENT label
  • Select Milestone as the next bugfix version

Define Related Issue(s)

Consider the impact to the other METplus components.

Bugfix Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of main_<Version>.
    Branch name: bugfix_<Issue Number>_main_<Version>_<Description>
  • Fix the bug and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Add any new Python packages to the METplus Components Python Requirements table.
  • Push local changes to GitHub.
  • Submit a pull request to merge into main_<Version>.
    Pull request: bugfix <Issue Number> main_<Version> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Development issues
    Select: Organization level software support Project for the current coordinated release
    Select: Milestone as the next bugfix version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Complete the steps above to fix the bug on the develop branch.
    Branch name: bugfix_<Issue Number>_develop_<Description>
    Pull request: bugfix <Issue Number> develop <Description>
    Select: Reviewer(s) and Development issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Close this issue.
@bikegeek bikegeek added type: bug Fix something that is not working alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED MORE DEFINITION Not yet actionable, additional definition required alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle priority: high High Priority requestor: NCAR National Center for Atmospheric Research component: plot mode data requestor: Community General Community requestor: NOAA/other NOAA Laboratory, not otherwise specified required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project labels Jan 24, 2024
@bikegeek
Copy link
Collaborator Author

bikegeek commented Jan 24, 2024

While investigating this issue, determine if this is observed with other linetypes. Include other linetypes besides MODE when testing.

7/22/2024. This appears to be an issue with the agg_stat_bootstrap calculations for MODE.

@TatianaBurek
Copy link
Collaborator

More from Tina:

I am graphing some MODE output in METviewer and coming across something that I don't understand, specifically with the CNTSUM attribute stat.  When I add a fixed value of fcst_init, and select all times, I get much smaller CNTSUM values than when the fcst_init is not included

I expected these to look identical since they should both use all available fcst_init times.  So, I am trying to understand why they are different.  Is CNTSUM computed differently when there are fcst_init fixed values set?  Or is there some other reason that may cause this difference?

@CPKalb
Copy link
Contributor

CPKalb commented Feb 27, 2024

Is there a time frame for when this issue may be fixed?

@bikegeek
Copy link
Collaborator Author

bikegeek commented Feb 27, 2024 via email

@CPKalb
Copy link
Contributor

CPKalb commented Feb 27, 2024

I am not certain. Let me check and I'll get back to you.

@bikegeek
Copy link
Collaborator Author

bikegeek commented Feb 27, 2024 via email

@CPKalb
Copy link
Contributor

CPKalb commented Mar 5, 2024

This is intermediate in urgency so I don't think you need to squeeze something in. We have ways to get around it for now.

@jprestop jprestop removed alert: NEED MORE DEFINITION Not yet actionable, additional definition required alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project labels May 15, 2024
@bikegeek
Copy link
Collaborator Author

bikegeek commented Jul 20, 2024

I don't think the issue is in METplotpy, it only plots the points provided by METviewer. I suspect there is something going on with how METviewer is performing aggregation. If that is the case, then METcalcpy is probably the issue.

Running the Python version of the xml file in METviewer, I see in the log file that agg_stat_eqz.py is invoked. Running this from the command line and comparing the output with the Rscript version might provide some insight.

@bikegeek
Copy link
Collaborator Author

bikegeek commented Jul 23, 2024

making the fix in METcalcpy issue #360 to the equations for OBJCSI and OBJACSI appears to have fixed this issue. The current run date/time is saved under the fcst_valid column. The CNTSUM column in the output data from the Python version has values that are consistent with the values in the plot_icperts_mode_apcp_obj_test-fixed-init-R.data.txt (the data file provided above in the "Describe the problem" section.

New plot (generated via METviewer with python code) which is consistent with the plot generated via R in the "Describe the problem" :

plot_icperts_mode_apcp_obj_test-fixed-init-Python

Data from running METviewer on 'dakota' with the fix to METcalcpy issue #360:
plot_icperts_mode_apcp_obj_test-fixed-init-Python_after_feature360_fix.data.txt

@bikegeek bikegeek transferred this issue from dtcenter/METplotpy Jul 24, 2024
@bikegeek bikegeek added this to the METcalcpy-3.0.0 milestone Jul 24, 2024
@bikegeek bikegeek linked a pull request Jul 26, 2024 that will close this issue
15 tasks
@bikegeek bikegeek removed a link to a pull request Jul 26, 2024
15 tasks
@bikegeek bikegeek removed the required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone label Aug 14, 2024
@bikegeek
Copy link
Collaborator Author

Found more issues with other MODE plotting using this configuration:
plot_20240814_191630.xml.txt

The plot is generated with R but no plot generated using Python.

Additional configs from Michelle:
NOTE changes to the fixed value of fcst_init should result in differences in the plot but there are no changes

plot_icperts_mode_apcp_obj_test-fixed-init-R.xml.txt
plot_icperts_mode_apcp_obj_test-fixed-init-Python.xml.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alert: NEED ACCOUNT KEY Need to assign an account key to this issue component: plot mode data priority: high High Priority requestor: Community General Community requestor: NCAR National Center for Atmospheric Research requestor: NOAA/other NOAA Laboratory, not otherwise specified type: bug Fix something that is not working
Projects
Status: 🟢 Ready
Development

No branches or pull requests

5 participants