Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: numpy histogram bin edges in cut (GH 14627) #23567

Conversation

blalterman
Copy link

@blalterman blalterman commented Nov 8, 2018

Passig a string to `pd.cut` bins kwarg dispatches bin calculation
to `np.histogram_bin_edges`.

    Passig a string to `pd.cut` bins kwarg dispatches bin calculation
    to `np.histogram_bin_edges`.
@pep8speaks
Copy link

pep8speaks commented Nov 8, 2018

Hello @bla1089! Thanks for updating the PR.

Line 49:80: E501 line too long (85 > 79 characters)
Line 50:37: W291 trailing whitespace
Line 51:1: W293 blank line contains whitespace
Line 94:73: W291 trailing whitespace
Line 95:77: W291 trailing whitespace
Line 195:1: W293 blank line contains whitespace

Line 57:1: W293 blank line contains whitespace
Line 58:1: W293 blank line contains whitespace
Line 59:9: E303 too many blank lines (2)
Line 62:13: E125 continuation line with same indent as next logical line
Line 62:13: E128 continuation line under-indented for visual indent

Comment last updated on January 05, 2019 at 17:33 Hours UTC

@gfyoung gfyoung added Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations labels Nov 11, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew entry in other enhancements

pandas/core/reshape/tile.py Outdated Show resolved Hide resolved
pandas/core/reshape/tile.py Show resolved Hide resolved
pandas/tests/reshape/test_tile.py Show resolved Hide resolved
@blalterman
Copy link
Author

Is there anything left to resolve?

@gfyoung
Copy link
Member

gfyoung commented Nov 15, 2018

Is there anything left to resolve?

@bla1089 : We recently had a huge doc-formatting change. If you could resolve the merge conflict, that would be fantastic.

@blalterman
Copy link
Author

blalterman commented Nov 15, 2018 via email

Passing a string to the `pd.cut` bins kwarg
dispatches bin calculation
to `np.histogram_bin_edges`.

Closes pandas-devgh-14627.
@gfyoung gfyoung force-pushed the cut-bins-from-numpy-histogram-bin-edges branch from d7b2e3e to 79b7145 Compare November 15, 2018 05:25
@gfyoung
Copy link
Member

gfyoung commented Nov 15, 2018

Not even sure how to do that. The site says they're too complex to solve
online.

That's because it's attempting to do a merge, which is much more difficult when the file that you modified actually got deleted on master. You need to do something a little more than that.

I just rebased for you right now. In essence, what we needed to do is undo your changes to v0.24.0.txt and re-apply them to v0.24.0.rst (the new file). Thus, I squashed your commits into one to facilitate the "undo", rebased on master, and then added your whatsnew entry to v0.24.0.rst.

@blalterman
Copy link
Author

blalterman commented Nov 15, 2018 via email

@blalterman
Copy link
Author

Can someone help me interpret the Travis CI and Azure failures?

@gfyoung
Copy link
Member

gfyoung commented Nov 19, 2018

Can someone help me interpret the Travis CI and Azure failures?

@bla1089 : When was histogram_bin_edges introduced to numpy ? It seems that it's breaking on some of the numpy versions, which means this implementation may not be 100% backwards-compatible.

@blalterman
Copy link
Author

blalterman commented Nov 19, 2018 via email

@gfyoung
Copy link
Member

gfyoung commented Nov 19, 2018

@bla1089 : Hmm...if you could double check when this was introduced, that would be great. It looks like this feature only applies to certain numpy versions.

@blalterman
Copy link
Author

blalterman commented Nov 19, 2018 via email

@gfyoung
Copy link
Member

gfyoung commented Nov 19, 2018

@bla1089 : That seems reasonable for now. Give that a shot!

    1)  Should bring compatibility down from 1.15 to 1.11.
@blalterman
Copy link
Author

blalterman commented Nov 27, 2018 via email

@gfyoung
Copy link
Member

gfyoung commented Nov 27, 2018

can you help me fugue out the status here?

@bla1089 : Sorry, I'm not sure I fully understand you here. Are you referring to debugging the tests?

@blalterman
Copy link
Author

blalterman commented Nov 27, 2018 via email

@gfyoung
Copy link
Member

gfyoung commented Nov 27, 2018

Based on the Travis failures, I see a couple of things:

It looks like your version checking of numpy isn't quite working, based on this build. In addition, something doesn't look quite right with your docstring formatting (see this build).

Not 100% sure off the top of my head how to fix the doc issues though...

cc @datapythonista

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of comments about the docstring, it'd be good to render it to see how it's rendered

The criteria to bin by.

* int : Defines the number of equal-width bins in the range of `x`. The
range of `x` is extended by .1% on each side to include the minimum
and maximum values of `x`.
* str : Bin calculaton dispatched to `np.histogram_bin_edges`. See that
documentation for details. (versionadded:: 0.24.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sphinx won't detect the version added, should be in its own line starting with ..


Passng a string for `bins` dispatches the bin calculation to numpy's
`histogram_bin_edges`. (Starting in version 0.24.)
>>> pd.cut(array([0.1, 0.1, 0.2, 0.5, 0.5, 0.9, 1.0]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave a blank line before this line

`histogram_bin_edges`. (Starting in version 0.24.)
>>> pd.cut(array([0.1, 0.1, 0.2, 0.5, 0.5, 0.9, 1.0]),
... bins="auto")
... # doctest: +ELLIPSIS`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go in the previous line, after the code

@blalterman
Copy link
Author

What's needed to complete the merge?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some merge conflicts that need to be addressed.

@@ -233,6 +233,7 @@ Other Enhancements
- :class:`DatetimeIndex` gained :attr:`DatetimeIndex.timetz` attribute. Returns local time with timezone information. (:issue:`21358`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`)
- :func: `~cut` `bins` kwarg now accepts a string, which is dispatched to `numpy.histogram_bin_edges`. (:issue:`14627`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No space after :func:.

You can link to the numpy method with :func:`numpy.historgram_bin_edges`.

@@ -44,6 +46,10 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
* sequence of scalars : Defines the bin edges allowing for non-uniform
width. No extension of the range of `x` is done.
* IntervalIndex : Defines the exact bins to be used.
* str : Bin calculaton dispatched to `np.histogram_bin_edges`. See that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, link to the numpy function here with :func:numpy.histogram_bin_edges.

except AttributeError:
cnt, bins = np.histogram(x, bins)

mn, mx = bins[0], bins[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equivalent to doing pd.cut(np.histogram_bin_edges(array, bins))? Why do we do the additional processing / adjustment starting here?

# Test that a `bin` string not present in `np.histogram_bin_edges`
# throws a ValueError.
with pytest.raises(ValueError,
match="'*' is not a valid estimator for `bins`",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will potentially fail if NumPy adjusts their error message. Given that it's not really testing anything in pandas, I think we're OK removing the match. It's probably fine to ensure that a ValueError is raised.

@blalterman
Copy link
Author

blalterman commented Jan 5, 2019 via email

@jreback
Copy link
Contributor

jreback commented Jan 5, 2019

@bla1089 this needs to merge master then can have a look

@blalterman
Copy link
Author

blalterman commented Jan 5, 2019 via email

@jreback
Copy link
Contributor

jreback commented Jan 5, 2019

@bla1089 you need to merge master

@blalterman
Copy link
Author

blalterman commented Jan 5, 2019 via email

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge master and address conflicts?

@@ -233,6 +233,7 @@ Other Enhancements
- :class:`DatetimeIndex` gained :attr:`DatetimeIndex.timetz` attribute. Returns local time with timezone information. (:issue:`21358`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`)
- :func:`~cut` `bins` kwarg now accepts a string, which is dispatched to `numpy.histogram_bin_edges`. (:issue:`14627`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to 0.25 at this point

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure how to get my fork to update here. I recently changed my GH id and I'm unsure if that's getting in the way. Any ideas?

@jreback
Copy link
Contributor

jreback commented Mar 3, 2019

merge master

@blalterman
Copy link
Author

blalterman commented Mar 3, 2019 via email

@jreback
Copy link
Contributor

jreback commented Mar 3, 2019

you need to merge upstream
git merge upstream/master

@blalterman
Copy link
Author

blalterman commented Mar 3, 2019 via email

@gfyoung
Copy link
Member

gfyoung commented Mar 3, 2019

@blalterman : Two questions:

  • When you run git remote -v, what does upstream point to?
  • Did you fetch from upstream first?

@blalterman
Copy link
Author

blalterman commented Mar 3, 2019 via email

@TomAugspurger
Copy link
Contributor

Now do a git merge upstream/master (and you'll need to fixup the conflicts.

@blalterman
Copy link
Author

blalterman commented Mar 4, 2019 via email

@WillAyd
Copy link
Member

WillAyd commented Mar 19, 2019

@blalterman are you able to update here? If you changed your GH credentials might be easiest to push a new PR

@WillAyd
Copy link
Member

WillAyd commented Apr 10, 2019

Closing as stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

numpy 1.11.0 nonuniform binning
7 participants