Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Handle floating point boundaries in qcut #59409

Merged
merged 5 commits into from
Aug 9, 2024

Conversation

rob-sil
Copy link
Contributor

@rob-sil rob-sil commented Aug 4, 2024

Panda's quantile uses NumPy's percentile, so it has to convert the quantile (between 0 and 1) to a percentile (between 0 and 100). However, np.percentile itself uses NumPy's quantile code, so np.percentile converts from a percentile back into a quantile. Together, these two operations can introduce a slight floating point error when the quantile inputs are multiplied then divided by 100, which is not a power of 2. This PR changes Panda's quantile code to directly use NumPy's quantile code, which was not available when the original Pandas quantile code was written.

This PR also changes how qcut picks quantiles when asked to split into a fixed number of quantiles. Some quantiles, such as 5/7, can't be represented as floats and have to be rounded to the nearest representable number. If that involves rounding down, then the level that defines the upper bound of the 5/7 quantile is incorrectly assigned to the 6/7 quantile. This PR changes the quantiles to round up rather than to the nearest when picking a floating point representation.

@mroeschke mroeschke added the cut cut, qcut label Aug 5, 2024
@mroeschke mroeschke added this to the 3.0 milestone Aug 9, 2024
@mroeschke mroeschke merged commit c831ccd into pandas-dev:main Aug 9, 2024
45 checks passed
@mroeschke
Copy link
Member

Thanks @rob-sil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cut cut, qcut
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Qcut interval not selecting the correct inclusive and exclusive limits
2 participants