Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add numeric_only=True to relevant pandas operations #259

Merged
merged 6 commits into from
Jul 15, 2024

Conversation

blychs
Copy link
Collaborator

@blychs blychs commented Jun 21, 2024

This pull request adds the keyword numeric_only=True to relevant pandas objects,
i.e. pandas.core.groupby.DataFrameGroupBy and pandas.core.window.rolling.Rolling .
The relevant operations are mean(), min(), max(), median() and quantile().
This makes it compatible with pandas 2.2.2, and should be a nice start for future proofing MM, without breaking backwards compatibility (i., e., pandas 1, for monet).

If we want to keep it future proof, we should also update the references to freq='H' to freq='h', but I am not sure about compatibility with older versions of pandas for this, so I have kept it the way it was for now.

This has been tested for the surface plots with wrf-chem and AirNow.

Let me know if additional tests are required.

	Objects: DataFrameGroupBy -> mean(),
			             min(),
                                     max(),
                                     median(),
                                     quantile(),
                 Roller -> same
	If this is correct, it should replicate
	old pandas behaviour (i. e., v0.XX)
Copy link
Collaborator

@zmoon zmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blychs I think you could go ahead with the change to using 'h' for hourly freq instead of 'H'. I tested using freq='h' in date range with pandas v1.0.5 (June 2020) and it worked fine.

Some places I identified:

  • melodies_monet/_cli.py date ranges
  • date range in docs/examples/idealized.ipynb
  • ts_avg_window in docs/examples/ and examples/yaml YAML files
  • resamples in driver and melodies_monet/util/tools.py
  • comments "pandas resample rule (e.g., 'H', 'D')" and "Pandas resampling rule (e.g., 'H', 'D')", which appear multiple places

The reformat* notebooks we are probably going to remove in the future so I wouldn't bother modifying them.

blychs and others added 3 commits June 24, 2024 16:39
	"H" is deprecated
	"h" also works with older versions
	This conforms also with numpy
   TODO:
	Modify examples, documentation and
	batch scripts
	"H" is deptracted
	"h" also works with older versions
	This also conforms with numpy
	TODO:
	     Modify ipynb files to conform
	     with this.
             Just running them should do the
             trick.
@blychs
Copy link
Collaborator Author

blychs commented Jun 25, 2024

@zmoon, I just prepared a new version, but now the newer scorecard plots (which I did not see there before) are failing in the updated version. I will update this PR once I've corrected the issue. It should not be merged until then.
Best
Pablo

    Older versions did this automatically, but
    with newer versions, it returns a Datarray
    with 1 value.
    This affects plots/surfplot.py scorecard
    plot.
@blychs
Copy link
Collaborator Author

blychs commented Jun 26, 2024

The corrections are done. airnow_wrfchem.ipynb has the same result as the current code. The mistake was probably due to newer package versions, and is explained in the commit message. Let me know if any other issue appears.

Having said that, all the notebooks in docs/examples run without issues, except for camchem.ipynb. When I am running that, there is a conflict in the renaming of lat and lon. If I am not mistaken, though, the problem seems to be in the monet_accessor.py, since the dataset seems to have latitude and longitude created before renaming. I get the same error if running the original melodies_monet develop branch, without my modifications.

@blychs
Copy link
Collaborator Author

blychs commented Jul 3, 2024

So, @rschwant or @zmoon , the issue seems to have nothing to do with this PR. I added it as Issue (#261) , together with a complete description and a few ideas of how to solve it. These can be at different levels, but I guess that that discussion should be continued there.

I have not found any other problems with this PR, and I think it is ready to undergo the tests that you deem necessary before (hopefully, if nothing else goes wrong) merging.

Edit: corrected typos

@rschwant rschwant self-requested a review July 15, 2024 15:41
Copy link
Collaborator

@rschwant rschwant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had 2 quick questions and then I can approve this later today. We can discuss quickly in the meeting today.

melodies_monet/plots/surfplots.py Show resolved Hide resolved
melodies_monet/plots/surfplots.py Show resolved Hide resolved
Copy link
Collaborator

@rschwant rschwant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zach confirmed my questions. Looks great thanks for updating this so we can move to Pandas 2!

@rschwant
Copy link
Collaborator

@zmoon, Do you know why these tests are failing though? Is it using ts_avg_window: '3h' in the idealized example? What error have you all been seeing on this?

@zmoon
Copy link
Collaborator

zmoon commented Jul 15, 2024

@rschwant the issue is an incompat between pandas v1 and matplotlib 3.9, which was released on conda-forge a week ago. pandas v2 deals with the matplotlib change but I guess it wasn't backported to v1.

To fix this for now, we could add matplotlib<3.9 to the environment yml file.

@rschwant
Copy link
Collaborator

Yes, that sounds good we can recommend an earlier version of matplotlib now too.

@blychs
Copy link
Collaborator Author

blychs commented Jul 15, 2024

Hi @rschwant , we saw the same with @dwfncar a couple of days ago.
See issue #262 for some comments on that
Pablo

@zmoon
Copy link
Collaborator

zmoon commented Jul 15, 2024

Thanks @blychs , reading your issue more closely would've saved me a few minutes. I made the change here in 352a088.

Copy link
Collaborator

@rschwant rschwant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks for making that update Zach!

@rschwant rschwant merged commit f2465d8 into NOAA-CSL:develop Jul 15, 2024
4 checks passed
@rschwant
Copy link
Collaborator

rschwant commented Jul 16, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants