Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use named aggregation with resample. #30092

Closed
alexjacobson95 opened this issue Dec 5, 2019 · 12 comments
Closed

Can't use named aggregation with resample. #30092

alexjacobson95 opened this issue Dec 5, 2019 · 12 comments
Labels
Milestone

Comments

@alexjacobson95
Copy link

alexjacobson95 commented Dec 5, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
from functools import partial

df = pd.DataFrame(
    np.random.randn(1000, 3), 
    index=pd.date_range('1/1/2012', freq='S', periods=1000), 
    columns=['A', 'B', 'C']
)
dfg = df.resample('3T').agg(
    {'A': [
        partial(np.quantile, q=.9999), 
        partial(np.quantile, q=.90),
    ]}
)

Problem description

Resample, unlike groupby, has no ability to do named aggregation. This means if you use a function twice on the same column you get an error saying:

pandas.core.base.SpecificationError: Function names must be unique, found multiple named quantile.

It would be very useful to make the resample agg interface similar to the groupby agg interface. An example of the named aggregation is here: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#named-aggregation

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.6.9.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 0.25.3 numpy : 1.17.4 pytz : 2019.3 dateutil : 2.8.1 pip : 18.1 setuptools : 40.6.2 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.15.1 pytables : None s3fs : None scipy : None sqlalchemy : None tables : None xarray : None xlrd : None xlwt : None xlsxwriter : None
@TomAugspurger
Copy link
Contributor

xref #28380

@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Dec 6, 2019
@charlesdong1991
Copy link
Member

take

@MarcoGorelli
Copy link
Member

Tried reproducing this on master - didn't get an error, but now the two aggregations are the same:

import pandas as pd
import numpy as np
from functools import partial

df = pd.DataFrame(
    np.random.randn(1000, 3), 
    index=pd.date_range('1/1/2012', freq='S', periods=1000), 
    columns=['A', 'B', 'C']
)
dfg = df.resample('3T').agg(
    {'A': [
        partial(np.quantile, q=.9999), 
        partial(np.quantile, q=.1111),
    ]}
)

returns

                            A          
                     quantile  quantile
2012-01-01 00:00:00 -1.035997 -1.035997
2012-01-01 00:03:00 -1.326052 -1.326052
2012-01-01 00:06:00 -1.267661 -1.267661
2012-01-01 00:09:00 -1.321961 -1.321961
2012-01-01 00:12:00 -1.103027 -1.103027
2012-01-01 00:15:00 -0.987614 -0.987614

@charlesdong1991
Copy link
Member

charlesdong1991 commented Jan 9, 2020

thanks @MarcoGorelli I think you would need to give a name for this function since those two can not be distinguished for now, so it always takes the last one to use, code below seems to work on master

quant9 = partial(np.percentile, q=.9999)
quant1 = partial(np.percentile, q=.1111)
quant9.__name__ = "quant9"
quant1.__name__ = "quant1"

dfg = df.resample('3T').agg(
    {'A': [quant9, quant1,]}
)

assigning different names for it should work.

and also i haven't had time to really work on this recently, not sure how the result will go.

@MarcoGorelli
Copy link
Member

i haven't had time to really work on this recently

Is this something you plan on coming back to? If so, that's fine, I'll leave it to you :) If not, I might have a solution in mind which doesn't involve renaming the functions.

@charlesdong1991
Copy link
Member

charlesdong1991 commented Jan 9, 2020

awesome! pls go for it! @MarcoGorelli
i unassigned myself

@charlesdong1991 charlesdong1991 removed their assignment Jan 9, 2020
@charlesdong1991
Copy link
Member

charlesdong1991 commented Jan 9, 2020

btw, i think you might need to open a new issue and address this, because this seems not related to resample, but quite general groupby.agg @MarcoGorelli since currently implementation does not distinguish the example you provided.

@charlesdong1991 charlesdong1991 added Needs Tests Unit test(s) needed to prevent regressions and removed Needs Tests Unit test(s) needed to prevent regressions labels Jan 9, 2020
@otaviocv
Copy link

Hello!

I posted a question on StackOverflow exactly about this issue. Could you, please, take a look? https://stackoverflow.com/questions/60788893/pandas-named-aggregation-not-working-with-resample-agg

Does anyone need help with this?

@jreback jreback modified the milestones: Contributions Welcome, 1.1 Apr 10, 2020
@MarcoGorelli
Copy link
Member

Does anyone need help with this?

@otaviocv I have an open PR (#30858) to fix this issue, but still need to address some things. Feel free to take over if you have a solution in mind, else I hope to be able to fix it by the 1.1 release (current target is August 2020)

@MarcoGorelli
Copy link
Member

@otaviocv just got round to checking your SO post, and it seems different to this one (and indeed not fixed by the PR I have open). If you could open a new issue with it, that'd be great!

@MarcoGorelli
Copy link
Member

Have opened the issue reported by @otaviocv in #34064, as it's different to this one

@charlesdong1991 @jreback does this issue still need to be open? Because the original issue no longer errors, and I took the overwriting part forward here #30880

@charlesdong1991
Copy link
Member

yeah, agree with it @MarcoGorelli ! could close this one already then to avoid duplication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants