Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix to_json when converting Period column #32665

Closed

Conversation

colonesej
Copy link

@colonesej colonesej commented Mar 12, 2020

AttributeError when trying to access frequency string freqstr directly on Series of Period type.

Modified to get it through dtype of Series, which is known to be Period.

import pandas as pd
test = pd.Series(pd.period_range('1/1/2011', freq='B', periods=3))
test.to_json(None, orient='table')

>>> '{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"values","type":"datetime","freq":"B"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[{"index":0,"values":{"day":3,"dayofyear":3,"daysinmonth":31,"freqstr":"B","is_leap_year":false,"month":1,"ordinal":10697,"qyear":2011,"start_time":"2011-01-03T00:00:00.000Z","week":1,"weekofyear":1}},{"index":1,"values":{"day":4,"dayofyear":4,"daysinmonth":31,"freqstr":"B","is_leap_year":false,"month":1,"ordinal":10698,"qyear":2011,"start_time":"2011-01-04T00:00:00.000Z","week":1,"weekofyear":1}},{"index":2,"values":{"day":5,"dayofyear":5,"daysinmonth":31,"freqstr":"B","is_leap_year":false,"month":1,"ordinal":10699,"qyear":2011,"start_time":"2011-01-05T00:00:00.000Z","week":1,"weekofyear":1}}]}'

Fix GH31917. AttributeError when trying to access frequency string directly on Series
@WillAyd
Copy link
Member

WillAyd commented Mar 12, 2020

Can you add a test for this? @jbrockmendel

@WillAyd WillAyd added IO JSON read_json, to_json, json_normalize Period Period data type labels Mar 12, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a test replicating the OP & a whatsnew note (bug fixes in 1.1 / IO)

@WillAyd
Copy link
Member

WillAyd commented Mar 16, 2020

Looks like a merge conflict in the whatsnew - can you fix and add a test case in pandas/tests/io/json/test_pandas.py?

@colonesej
Copy link
Author

OK, I'll do it, but first I need to figure it out how to make a test. Have to read more carefully the guide.

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2020

@colonesej you should just be able to place a test in pandas/tests/io/json/test_pandas.py for this (and can look at other examples there too)

Can you fix up merge conflict here as well?

@pep8speaks
Copy link

pep8speaks commented Apr 15, 2020

Hello @colonesej! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-18 16:09:56 UTC

@colonesej
Copy link
Author

I made this roundtrip test completely based on the one for a time series with a DatetimeIndex. Some points:

  • I could not find any option to make read_json treat the data and generate a PeriodIndex when orient equals to "column" or "index". Conversion does not carry frequency information. I'm not sure what exactly should be the behaviour so just converted manually to Period and inferred freq is OK.
  • When orient = "split", the test fails as read_json cannot convert back the dict created by to_json. It works with a DatetimeIndex, which conversion results in float formatted dates. With Period it converts to separated keys for different aspects of Period object. I could not find where these were determined nor what should be preferred.

To illustrate, using series.to_json(orient='split') when index is:

DatetimeIndex:

'{"name":null,"index":[946857600000,946944000000,947030400000,947116800000,947203200000,947462400000,947548800000,947635200000,947721600000,947808000000,948067200000,948153600000,948240000000,948326400000,948412800000,948672000000,948758400000,948844800000,948931200000,949017600000,949276800000,949363200000,949449600000,949536000000,949622400000,949881600000,949968000000,950054400000,950140800000,950227200000],"data":[-0.7273489893,0.43763153,0.0064352361,-1.0478282669,0.6605651751,0.2847295244,1.6006677017,-1.2731074841,0.3004426454,0.274394972,0.0402248331,1.5284606656,0.5741635466,-0.3126890855,1.4765392606,0.3663030854,0.3132193099,-0.0730319057,-0.3956868944,1.4626210011,0.2129331654,-0.6671879793,-1.6477379639,0.7626651704,-1.7852083837,0.0707591559,1.3089143648,-1.0282706187,-0.7853299219,-0.2119391625]}'

PeriodIndex:

'{"name":null,"index":[{"day":3,"dayofyear":3,"daysinmonth":31,"freqstr":"B","is_leap_year":true,"month":1,"ordinal":7827,"qyear":2000,"start_time":946857600000,"week":1,"weekofyear":1},

@@ -473,6 +473,15 @@ def _create_series(index):
}


@pytest.fixture
def period_series():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd did we settle on a naming scheme for these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable; we already have a datetime_series so this follows that pattern

def test_series_roundtrip_periodseries(self, orient, period_series):
# GH32665: Fix to_json when converting Period column/series
data = period_series.to_json(orient=orient)
result = pd.read_json(data, typ="series", orient=orient)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue affects reading to a DataFrame as well right? If so can you test that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can do that.

@colonesej
Copy link
Author

colonesej commented Apr 17, 2020

The point with orient="split" seems to be connected to how to_json converts columns/series with values of PeriodDtype. It constructs that dictionary that cannot be converted back by read_json. Not directly.

I don't know what should be the behaviour here:

  • change read_json to specifically treat when orient="split" expecting a dict of attributes, or
  • change to_json to convert Period values to float (datetime behaviour) or use __repr__. The former
    is what other options of to_json seems to use.

I made this other test with a frame column with PeriodDtype values and eventually test this by the time a solution is proposed.

@WillAyd
Copy link
Member

WillAyd commented Jun 5, 2020

@colonesej can you merge master and try to get green?

@colonesej
Copy link
Author

Yes, but you mean, make the roundtrip tests I've made pass?

I could have some help about how to_json works with orient = split.

Or could make a different test, testing based on some expected output template. Will not solve the roundtrip issue but will make it green. What is preferable?

@WillAyd
Copy link
Member

WillAyd commented Jun 7, 2020

If there’s one orient that doesn’t work you can pytest.skip it for now - check some of the other tests in the module for examples

@WillAyd
Copy link
Member

WillAyd commented Jul 29, 2020

@colonesej is this PR still active?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_json of Series with period dtype results in AttributeError
5 participants