Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.asof #2941

Closed
changhiskhan opened this issue Feb 27, 2013 · 11 comments
Closed

DataFrame.asof #2941

changhiskhan opened this issue Feb 27, 2013 · 11 comments
Labels
Datetime Datetime data dtype Enhancement Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@changhiskhan
Copy link
Contributor

like Series.asof, should also take skipna={none, 'any', 'all'}

@chrisaycock
Copy link
Contributor

Any thoughts on this? Right now DataFrame.asof() can be emulated via

df.apply(lambda x: x.asof(some_series))

But this is wasteful because it searches each column redundantly.

@jreback
Copy link
Contributor

jreback commented Sep 4, 2014

this is pretty straightforward to do for a frame, (its very similar to the series logic), in fact I would move to core/generic.py (with a small adjustment). PR?

@chrisaycock
Copy link
Contributor

@jreback I can take a stab at it this weekend if you like.

@jreback
Copy link
Contributor

jreback commented Sep 4, 2014

that would be awesome!

@chrisaycock
Copy link
Contributor

@jreback The biggest issue I see is that take_1d() is for arrays rather than DataFrames. The Series.asof() invokes take_1d() to handle indices from asof_locs() that might be -1 (indicating that np.searchsorted() failed to find an appropriate location). Is there an equivalent to take_1d() for DataFrames? I tried calling it in df.apply(), but got an error about shape.

df = pd.DataFrame({'letter':['a', 'b', 'c'], 'number':[1, 2, 3]})
df.apply(lambda x: pd.core.common.take_1d(x.values, [-1, 1]))

Here is a quick-and-dirty hack that produces what I'm looking for:

In [20]: pd.DataFrame(dict(zip(df.columns, map(lambda x: pd.core.common.take_1d(df[x].values, [-1, 1]), df.columns))))
Out[20]: 
  letter  number
0    NaN     NaN
1      b       2

@jreback
Copy link
Contributor

jreback commented Sep 4, 2014

you can just use DAtaFrame.take

@chrisaycock
Copy link
Contributor

@jreback That just treats the -1 the same way regular Python does: wrapping around the end of the array:

In [4]: df.take([-1, 1])    # doesn't matter what I set 'convert' to
Out[4]: 
  letter  number
2      c       3
1      b       2

I'm looking for a way to return a nan when the index is -1, just like in take_1d(). That's how the Series.asof() works to resolve instances when an element in the where parameter occurs before anything in the Series.

@jreback
Copy link
Contributor

jreback commented Sep 4, 2014

you need to look at the internal functions for this, eg take_2d

or you can 1s take with a flattened array then reshape

@bwillers
Copy link
Contributor

I've been toying with this a little and it's not clear that it's possible to avoid doing the asof_locs call for each column, since they might have missings in different positions. So you probably dont end up gaining much over what the df.apply(lambda s: s.asof(t)) approach gives you.

However, if you don't want missings to be ffilled, then you only need to do it once. Perhaps it makes sense to have a flag argument which indicates whether the user wants missings to be ffilled during the asof operation?

@bwillers
Copy link
Contributor

@shoyer As discussed in #10266 this functionality is already available via df.reindex(where, method='ffill'), potentially with a dropna/fillna depending exactly what you want out at the end of the day. So we can probably close this this issue?

@shoyer
Copy link
Member

shoyer commented Jun 13, 2015

Yep, let's close it.

@shoyer shoyer closed this as completed Jun 13, 2015
jreback added a commit that referenced this issue Jun 17, 2016
closes #1870
xref #2941

http://nbviewer.jupyter.org/gist/jreback/5f089d308750c89b2a7d7446b790c056
is a notebook of example usage and timings

Author: Jeff Reback <jeff@reback.net>

Closes #13358 from jreback/asof and squashes the following commits:

4592fa2 [Jeff Reback] TST: reorg tests/series/test_timeseries -> test_asof
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

5 participants