Index.intersection changed behavior to sort by default in pandas 0.24 #24959

shoyer · 2019-01-27T01:19:05Z

Prior to #24521 (i.e., for pandas 0.23 and earlier), pandas.Index.intersection would not sort values:

# pandas 0.23.3
In [21]: pd.Index(['c', 'b', 'a']).intersection(['b', 'a'])
Out[21]: Index(['b', 'a'], dtype='object')

Now it does:

In [28]: pd.Index(['c', 'b', 'a']).intersection(['b', 'a'])
Out[28]: Index(['a', 'b'], dtype='object')

This turned up in a failure in xarray's test suite (not a real bug): pydata/xarray#2717

It's fine to change this for consistency, but a deprecation cycle would probably make sense so users aren't surprised by the behavior of their code changing, e.g., we could default to sort=None and issue a FutureWarning for now.

The text was updated successfully, but these errors were encountered:

jreback · 2019-01-27T01:21:29Z

union and difference do sort by default; i suppose intersection was just not clear; in any event the option is now there to control this

shoyer · 2019-01-27T01:41:02Z

The documented behavior of Index.section was not to sort. To quote the pandas 0.23.x docs:
"This returns a new Index with elements common to the index and other, preserving the order of the calling index."
http://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.Index.intersection.html

This language has been there since at last pandas 0.20.x, July 2017:
http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.Index.intersection.html

jorisvandenbossche · 2019-01-27T07:15:32Z

We also failed to actually mention this change in the whatsnew or docstring.

I agree that ideally we should have done it with a FutureWarning. Still adding one for 0.24.1 of course changes the behaviour again ..

TomAugspurger · 2019-01-27T21:55:09Z

I think doing a quick 0.24.1 (within a few days) switching this is the best option. Will put up a PR for discussion.

TomAugspurger · 2019-01-27T21:56:18Z

But just to be clear, not sorting is better right? Preserving the original order should be the default?

Closes pandas-dev#24959

jorisvandenbossche · 2019-01-28T17:02:34Z

Trying to get an overview of things.

In 0.23.4, we had the following behaviour:

None of the methods had a sort keyword

Index.intersection preserved the original order in left and right. Eg:

In [84]: pd.Index(['a', 'e', 'c']).intersection(['e', 'd', 'c']) 
Out[84]: Index(['e', 'c'], dtype='object')

Index.union typically sorted, but not in all cases:

In [86]: pd.Index(['a', 'e', 'c']).union(['a', 'b'])
Out[86]: Index(['a', 'b', 'c', 'e'], dtype='object')

A case where it does not sort: equal index objects:

In [88]: pd.Index(['a', 'e', 'c']).union(pd.Index(['a', 'e', 'c']))
Out[88]: Index(['a', 'e', 'c'], dtype='object')

In pandas 0.24.0, we added a sort keyword to most of the set-operation methods. First to Index.difference (#22811), later also to union and intersection (#24521).

To match the current behaviour, a default of sort=True was used for union and difference. @reidy-p is that the correct reason? Or were there other arguments?

But, for Index.intersection, a default of sort=True actually changed the behaviour, hence this issue.

So we had inconsistent behaviour between intersection and union/difference, and also within one method (sometimes it is not sorted, see example above).
Long term, I think we would ideally get a consistent behaviour. The question is then what the default behaviour should be.

I think there are several reasons for this to be False.Eg that is more consistent with what we did for eg concat? (that will no longer sort by default in the future)

shoyer · 2019-01-28T17:48:53Z

Long term, I agree that:

Index operations should all consistently sort or not sort by default
Not sorting by default is probably preferable

Also: changing this behavior immediately in 0.24.0 without a deprecation cycle was not ideal. I think we should revert this change, by setting the default for intersection back to sort=False. Then we can change the default to sort=None (for "legacy behavior" vs "explicitly unsorted") and start a deprecation cycle.

I'm also worried about the new default of sort=True for union. This will almost certainly entail a breaking change for users when we fix union() to actually always sort (e.g., in the case of equal objects). Instead, I think we should make the default sort=None (for "legacy behavior") and raise errors if sort=True is passed but we can't or don't actually sort.

TomAugspurger · 2019-01-28T17:55:55Z

Then we can change the default to sort=None (for "legacy behavior" vs "explicitly unsorted") and start a deprecation cycle.

What would the benefit of a deprecation cycle be? IIUC the desired long-term behavior for Index.intersection is sort=False. If we release a 0.24.1 with sort=False, then we shouldn't need a deprecation cycle, since every version of pandas except 0.24.0 does the right thing.

I didn't really appreciate the the subtleties of Index.union(sort=True) until now... I agree that passing sort=True should always result in a sorted index or error, but we also don't want to change behavior. Your suggestion of sort=None meaning "legacy" behavior (with all the complexities of maybe not sorting) is good.

errors if sort=True is passed but we can't or don't actually sort.

I think that long-term, once the default for Index.union is sort=False, we can even sort identical objects (rather than raising an error). So we would only raise when the user explicitly requests sort=True and the objects aren't comparable.

jreback · 2019-01-28T17:58:15Z

@shoyer Note that Index.union() previously and currently defaults to sort=True

shoyer · 2019-01-28T17:59:18Z

What would the benefit of a deprecation cycle be? IIUC the desired long-term behavior for Index.intersection is sort=False. If we release a 0.24.1 with sort=False, then we shouldn't need a deprecation cycle, since every version of pandas except 0.24.0 does the right thing.

Right, nevermind. We won't need a deprecation cycle if we don't change this :)

shoyer · 2019-01-28T18:01:30Z

Note that Index.union() previously and currently defaults to sort=True

Not quite :). The previous default of Index.union() is "usually sort". Now we're on the verge of establishing sort=True as also indicating "usually sort".

jreback · 2019-01-28T18:18:11Z

Not quite :). The previous default of Index.union() is "usually sort". Now we're on the verge of establishing sort=True as also indicating "usually sort".

Not sure what you mean. It previously sorted and still does by default. That is the same. Sure you have the option now NOT to sort.

jorisvandenbossche · 2019-01-28T18:23:39Z

Not sure what you mean. It previously sorted and still does by default.

@jreback For example, it does not sort (by default, if sort=True) when indices are equal. See my example above (#24959 (comment))

jreback · 2019-01-28T18:35:56Z

@jorisvandenbossche that is a case w/o distinction. We don't do anything when things are equal in lots of cases, e.g. .reindex(). So this would not change.

TomAugspurger · 2019-01-28T18:51:23Z

But then we're in the situation @shoyer described in
#24959 (comment), where sort=True means "usually sort". That's bad right?

jorisvandenbossche · 2019-01-28T19:09:04Z

Indeed, keeping this "sort unless objects are equal" as the default with sort=True, basically means that you are not guaranteed that the result is sorted, even when passing sort=True, which IMO is somewhat unfortunate.

TomAugspurger · 2019-01-28T19:17:54Z

So, if we agree that sort=True's current behavior of only sometimes sorting is bad, then what do we do?

Proposal: For 0.24.1, change the default to None, which is the old behavior.

sort : bool or None, default None
    Whether to sort the result of the union.
    By default, labels are sorted when

      * `self` is not identical to `other`
      * The items in `self` and `other` can all be compared.

    Specify ``sort=True`` to always attempt to sort the values,
    even if `self` and `other` are identical, or the values cannot
    be compared. If the values cannot be compared, and ``sort=True``,
    a ``TypeError`` is raised.

    Specify ``sort=False`` to not attempt to sort the values.

Later (0.25.0), we deprecate sort=None and say that in the future the default will be sort=False.

My only hesitation with that is it no longer allows the old behavior of sorting iff the values are not identical and comparable. But is that old behavior useful in practice?

TomAugspurger · 2019-01-28T23:35:55Z

@jreback thoughts on

Changing the default Index.difference to sort=False, to be compatible with pre-0.23.4 (
REGR: Preserve order by default in Index.intersection #24967)
Changing the default Index.union to sort=None Index.intersection changed behavior to sort by default in pandas 0.24 #24959 (comment)

2 isn't time-sensitive, since sort=True is backwards compatible with pre-0.23.4. It could be done as part of 0.25.

jorisvandenbossche · 2019-01-29T13:59:49Z

+1 on Tom's proposal.

I think there is agreement on that we want sort=False eventually as the long-term behaviour? (@jreback is that correct? I assume so based on #24967 (comment))

If we agree on that, I think the most logical thing (most easy to deal with for code depending on it) is reverting the intersection behaviour back to sort=False for 0.24.1, as Tom proposes.

And then actually deprecating sort=True/None for the others is indeed less time sensitive / probably better for 0.25.0.

jreback · 2019-01-29T16:15:44Z

If you want to revert for 0.24.1 ok I guess.

TomAugspurger · 2019-01-29T16:17:21Z

Thanks. #24967 implements that, and Joris is +1 already. Any other comments there or good to merge?

shoyer · 2019-01-29T17:02:25Z

And then actually deprecating sort=True/None for the others is indeed less time sensitive / probably better for 0.25.0.

To be honest, I'm not sure it was a good idea to add this argument at all in its current broken state -- I would consider reverting support for sort=True at all until we can sort consistently. Otherwise we will be establishing "sort sometimes" as expected behavior.

TomAugspurger · 2019-01-29T17:39:28Z

FWIW, I don't think implementing sort=True means "always sort" will take too long if we prefer to do that for 0.24.1. Then we can change the default to sort=None (legacy behavior) and can deprecate that for sort=False at our leisure.

Closes pandas-dev#24959

TomAugspurger · 2019-01-29T18:04:56Z

Opened #25007 with a POC for changing the default of Index.union(sort=None), roughly implementing
#24959 (comment).

Won't have more time to work on that until tonight.

I'm +0.5 for including that in a 0.24.1. If we want to ensure that sort=True means always sort, we'll need a breaking change from 0.24.0. I think it's best to get that done with before people start relying on sort=True (as the default) heavily.

Closes #24959

jorisvandenbossche · 2019-01-29T21:47:15Z

Sorry, forget that last comment (old text that github cached on reload).

Re-opening, as we still have the discussion / PR on the sort default for the other methods.

reidy-p · 2019-01-29T21:55:55Z

I'm a bit late coming back to this but want to say thanks to everyone for a very thoughtful discussion and for following up so quickly.

Just to add some more context of where I was coming from. I was originally working on #22811 and found that to get groupby.nth to maintain column order I needed to be able to control the sort behaviour of Index.difference so I added a sort parameter to this method. After this was merged in I realised that the other set operations for Index did not have any sort parameters which is unfortunately a bit inconsistent. So I opened #24471 to propose adding sort parameters to the other set operations for Index. This has been discussed in plenty of other issues that I reference in that issue but has never been implemented.

My plan was to implement the sort parameter for Index and other index types but I felt that it would be far too big to put in one PR so I planned to make several PRs and have a checklist in #24471 to make sure all the issues and corner-cases (such as equal indices) were covered. I agree with @shoyer that it's unfortunate that the parameter is pretty broken at the moment because I didn't get a chance to do as much work on it as I wanted before the release of 0.24.0 and I apologise for any inconvenience. I'd be happy to keep working on the sort parameter to make it work properly once we have any immediate issues with the legacy behaviour fixed.

TomAugspurger · 2019-01-30T14:12:18Z

No worries @reidy-p, totally understood about getting busy. Index.union sorting by default was a major headache, just need to clean up some of the edge cases.

TomAugspurger · 2019-01-30T14:12:39Z

Adding this back to 0.24.1 milestone FYI.

Closes pandas-dev#24959

* ERR/TST: Add pytest idiom to dtypes/test_cast.py (pandas-dev#24847) * fix MacPython pandas-wheels failue (pandas-dev#24851) * DEPS: Bump pyarrow min version to 0.9.0 (pandas-dev#24854) Closes pandas-devgh-24767 * DOC: Document AttributeError for accessor (pandas-dev#24855) Closes pandas-dev#20579 * Start whatsnew for 0.24.1 and 0.25.0 (pandas-dev#24848) * DEPR/API: Non-ns precision in Index constructors (pandas-dev#24806) * BUG: Format mismatch doesn't coerce to NaT (pandas-dev#24815) * BUG: Properly parse unicode usecols names in CSV (pandas-dev#24856) * CLN: fix typo in asv eval.Query suite (pandas-dev#24865) * BUG: DataFrame respects dtype with masked recarray (pandas-dev#24874) * REF/CLN: Move private method (pandas-dev#24875) * BUG : ValueError in case on NaN value in groupby columns (pandas-dev#24850) * BUG: fix floating precision formatting in presence of inf (pandas-dev#24863) * DOC: Creating top-level user guide section, and moving pages inside (pandas-dev#24677) * DOC: Creating top-level development section, and moving pages inside (pandas-dev#24691) * DOC: Creating top-level getting started section, and moving pages inside (pandas-dev#24678) * DOC: Implementing redirect system, and adding user_guide redirects (pandas-dev#24715) * DOC: Implementing redirect system, and adding user_guide redirects * Using relative urls for the redirect * Validating that no file is overwritten by a redirect * Adding redirects for getting started and development sections * DOC: fixups (pandas-dev#24888) * Fixed heading on whatnew * Remove empty scalars.rst * CLN: fix typo in ctors.SeriesDtypesConstructors setup (pandas-dev#24894) * DOC: No clean in sphinx_build (pandas-dev#24902) Closes pandas-dev#24727 * BUG (output formatting): use fixed with for truncation column instead of inferring from last column (pandas-dev#24905) * DOC: also redirect old whatsnew url (pandas-dev#24906) * Revert BUG-24212 fix usage of Index.take in pd.merge (pandas-dev#24904) * Revert BUG-24212 fix usage of Index.take in pd.merge xref pandas-dev#24733 xref pandas-dev#24897 * test 0.23.4 output * added note about buggy test * DOC: Add experimental note to DatetimeArray and TimedeltaArray (pandas-dev#24882) * DOC: Add experimental note to DatetimeArray and TimedeltaArray * Disable M8 in nanops (pandas-dev#24907) * Disable M8 in nanops Closes pandas-dev#24752 * CLN: fix typo in asv benchmark of non_unique_sorted, which was not sorted (pandas-dev#24917) * API/VIS: remove misc plotting methods from plot accessor (revert pandas-dev#23811) (pandas-dev#24912) * DOC: some 0.24.0 whatsnew clean-up (pandas-dev#24911) * DOC: Final reorganization of documentation pages (pandas-dev#24890) * DOC: Final reorganization of documentation pages * Move ecosystem to top level * DOC: Adding redirects to API moved pages (pandas-dev#24909) * DOC: Adding redirects to API moved pages * DOC: Making home page links more compact and clearer (pandas-dev#24928) * DOC: 0.24 release date (pandas-dev#24930) * DOC: Adding version to the whatsnew section in the home page (pandas-dev#24929) * API: Remove IntervalArray from top-level (pandas-dev#24926) * RLS: 0.24.0 * DEV: Start 0.25 cycle * DOC: State that we support scalars in to_numeric (pandas-dev#24944) We support it and test it already. xref pandas-devgh-24910. * DOC: Minor what's new fix (pandas-dev#24933) * TST: GH#23922 Add missing match params to pytest.raises (pandas-dev#24937) * Add tests for NaT when performing dt.to_period (pandas-dev#24921) * DOC: switch headline whatsnew to 0.25 (pandas-dev#24941) * BUG-24212 fix regression in pandas-dev#24897 (pandas-dev#24916) * CLN: reduce overhead in setup for categoricals benchmarks in asv (pandas-dev#24913) * Excel Reader Refactor - Base Class Introduction (pandas-dev#24829) * TST/REF: Add pytest idiom to test_numeric.py (pandas-dev#24946) * BLD: silence npy_no_deprecated warnings with numpy>=1.16.0 (pandas-dev#24864) * CLN: Refactor cython to use memory views (pandas-dev#24932) * DOC: Clean sort_values and sort_index docstrings (pandas-dev#24843) * STY: use pytest.raises context syntax (indexing) (pandas-dev#24960) * Fixed itertuples usage in to_dict (pandas-dev#24965) * Fixed itertuples usage in to_dict Closes pandas-dev#24940 Closes pandas-dev#24939 * STY: use pytest.raises context manager (resample) (pandas-dev#24977) * DOC: Document breaking change to read_csv (pandas-dev#24989) * DEPR: Fixed warning for implicit registration (pandas-dev#24964) * STY: use pytest.raises context manager (indexes/datetimes) (pandas-dev#24995) * DOC: move whatsnew note of pandas-dev#24916 (pandas-dev#24999) * BUG: Fix broken links (pandas-dev#25002) The previous location of contributing.rst file was /doc/source/contributing.rst but has been moved to /doc/source/development/contributing.rst * fix for BUG: grouping with tz-aware: Values falls after last bin (pandas-dev#24973) * REGR: Preserve order by default in Index.difference (pandas-dev#24967) Closes pandas-dev#24959 * CLN: do not use .repeat asv setting for storing benchmark data (pandas-dev#25015) * CLN: isort asv_bench/benchmark/algorithms.py (pandas-dev#24958) * fix+test to_timedelta('NaT', box=False) (pandas-dev#24961) * PERF: significant speedup in sparse init and ops by using numpy in check_integrity (pandas-dev#24985) * BUG: Fixed merging on tz-aware (pandas-dev#25033) * Test nested PandasArray (pandas-dev#24993) * DOC: fix error in documentation pandas-dev#24981 (pandas-dev#25038) * BUG: support dtypes in column_dtypes for to_records() (pandas-dev#24895) * Makes example from docstring work (pandas-dev#25035) * CLN: typo fixups (pandas-dev#25028) * BUG: to_datetime(strs, utc=True) used previous UTC offset (pandas-dev#25020) * BUG: Better handle larger numbers in to_numeric (pandas-dev#24956) * BUG: Better handle larger numbers in to_numeric * Warn about lossiness when passing really large numbers that exceed (u)int64 ranges. * Coerce negative numbers to float when requested instead of crashing and returning object. * Consistently parse numbers as integers / floats, even if we know that the resulting container has to be float. This is to ensure consistent error behavior when inputs numbers are too large. Closes pandas-devgh-24910. * MAINT: Address comments * BUG: avoid usage in_qtconsole for recent IPython versions (pandas-dev#25039) * Drop IPython<4.0 compat * Revert "Drop IPython<4.0 compat" This reverts commit 0cb0452. * update a * whatsnew * REGR: fix read_sql delegation for queries on MySQL/pymysql (pandas-dev#25024) * DOC: Start 0.24.2.rst (pandas-dev#25026) [ci skip] * REGR: rename_axis with None should remove axis name (pandas-dev#25069) * clarified the documentation for DF.drop_duplicates (pandas-dev#25056) * Clarification in docstring of Series.value_counts (pandas-dev#25062) * ENH: Support fold argument in Timestamp.replace (pandas-dev#25046) * CLN: to_pickle internals (pandas-dev#25044) * Implement+Test Tick.__rtruediv__ (pandas-dev#24832) * API: change Index set ops sort=True -> sort=None (pandas-dev#25063) * BUG: to_clipboard text truncated for Python 3 on Windows for UTF-16 text (pandas-dev#25040) * PERF: use new to_records() argument in to_stata() (pandas-dev#25045) * DOC: Cleanup 0.24.1 whatsnew (pandas-dev#25084) * Fix quotes position in pandas.core, typos and misspelled parameters. (pandas-dev#25093) * CLN: Remove sentinel_factory() in favor of object() (pandas-dev#25074) * TST: remove DST transition scenarios from tc pandas-dev#24689 (pandas-dev#24736) * BLD: remove spellcheck from Makefile (pandas-dev#25111) * DOC: small clean-up of 0.24.1 whatsnew (pandas-dev#25096) * DOC: small doc fix to Series.repeat (pandas-dev#25115) * TST: tests for categorical apply (pandas-dev#25095) * CLN: use dtype in constructor (pandas-dev#25098) * DOC: frame.py doctest fixing (pandas-dev#25097) * DOC: 0.24.1 release (pandas-dev#25125) [ci skip] * Revert set_index inspection/error handling for 0.24.1 (pandas-dev#25085) * DOC: Minor what's new fix (pandas-dev#24933) * Backport PR pandas-dev#24916: BUG-24212 fix regression in pandas-dev#24897 (pandas-dev#24951) * Revert "Backport PR pandas-dev#24916: BUG-24212 fix regression in pandas-dev#24897 (pandas-dev#24951)" This reverts commit 84056c5. * DOC/CLN: Timezone section in timeseries.rst (pandas-dev#24825) * DOC: Improve timezone documentation in timeseries.rst * edit some of the examples * Address review * DOC: Fix validation type error RT04 (pandas-dev#25107) (pandas-dev#25129) * Reading a HDF5 created in py2 (pandas-dev#25058) * BUG: Fixing regression in DataFrame.all and DataFrame.any with bool_only=True (pandas-dev#25102) * Removal of return variable names (pandas-dev#25123) * DOC: Improve docstring of Series.mul (pandas-dev#25136) * TST/REF: collect DataFrame reduction tests (pandas-dev#24914) * Fix validation error type `SS05` and check in CI (pandas-dev#25133) * Fixed tuple to List Conversion in Dataframe class (pandas-dev#25089) * STY: use pytest.raises context manager (indexes/multi) (pandas-dev#25175) * DOC: Updates to Timestamp document (pandas-dev#25163) * BLD: pin cython language level to '2' (pandas-dev#25145) Not explicitly pinning the language level has been producing future warnings from cython. The next release of cython is going to change the default level to '3str' under which the pandas cython extensions do not compile. The long term solution is to update the cython files to the next language level, but this is a stop-gap to keep pandas building. * CLN: Use ABCs in set_index (pandas-dev#25128) * DOC: update docstring for series.nunique (pandas-dev#25116) * DEPR: remove PanelGroupBy, disable DataFrame.to_panel (pandas-dev#25047) * BUG: DataFrame.merge(suffixes=) does not respect None (pandas-dev#24819) * fix MacPython pandas-wheels failure (pandas-dev#25186) * modernize compat imports (pandas-dev#25192) * TST: follow-up to Test nested pandas array pandas-dev#24993 (pandas-dev#25155) * revert changes to tests in pandas-devgh-24993 * Test nested PandasArray * isort test_numpy.py * change NP_VERSION_INFO * use LooseVersion * add _np_version_under1p16 * remove blank line from merge master * add doctstrings to fixtures * DOC/CLN: Fix errors in Series docstrings (pandas-dev#24945) * REF: Add more pytest idiom to test_holiday.py (pandas-dev#25204) * DOC: Fix validation type error SA05 (pandas-dev#25208) Create check for SA05 errors in CI * BUG: Fix Series.is_unique with single occurrence of NaN (pandas-dev#25182) * REF: Remove many Panel tests (pandas-dev#25191) * DOC: Fixes to docstrings and add PR10 (space before colon) to validation (pandas-dev#25109) * DOC: exclude autogenerated c/cpp/html files from 'trailing whitespace' checks (pandas-dev#24549) * STY: use pytest.raises context manager (indexes/period) (pandas-dev#25199) * fix ci failures (pandas-dev#25225) * DEPR: remove tm.makePanel and all usages (pandas-dev#25231) * DEPR: Remove Panel-specific parts of io.pytables (pandas-dev#25233) * DEPR: Add Deprecated warning for timedelta with passed units M and Y (pandas-dev#23264) * BUG-25061 fix printing indices with NaNs (pandas-dev#25202) * BUG: Fix regression in DataFrame.apply causing RecursionError (pandas-dev#25230) * BUG: Fix regression in DataFrame.apply causing RecursionError * Add feedback from PR * Add feedback after further code review * Add feedback after further code review 2 * BUG: Fix read_json orient='table' without index (pandas-dev#25170) (pandas-dev#25171) * BLD: prevent asv from calling sys.stdin.close() by using different launch method (pandas-dev#25237) * (Closes pandas-dev#25029) Removed extra bracket from cheatsheet code example. (pandas-dev#25032) * CLN: For loops, boolean conditions, misc. (pandas-dev#25206) * Refactor groupby group_add from tempita to fused types (pandas-dev#24954) * CLN: Remove ipython 2.x compat (pandas-dev#25150) * CLN: Remove ipython 2.x compat * trivial change to trigger asv * Update v0.25.0.rst * revert whatsnew * BUG: Duplicated returns boolean dataframe (pandas-dev#25234) * REF/TST: resample/test_base.py (pandas-dev#25262) * Revert "BLD: prevent asv from calling sys.stdin.close() by using different launch method (pandas-dev#25237)" (pandas-dev#25253) This reverts commit f67b7fd. * BUG: pandas Timestamp tz_localize and tz_convert do not preserve `freq` attribute (pandas-dev#25247) * DEPR: remove assert_panel_equal (pandas-dev#25238) * PR04 errors fix (pandas-dev#25157) * Split Excel IO Into Sub-Directory (pandas-dev#25153) * API: Ensure DatetimeTZDtype standardizes pytz timezones (pandas-dev#25254) * API: Ensure DatetimeTZDtype standardizes pytz timezones * Add whatsnew * BUG: Fix exceptions when Series.interpolate's `order` parameter is missing or invalid (pandas-dev#25246) * BUG: raise accurate exception from Series.interpolate (pandas-dev#24014) * Actually validate `order` before use in spline * Remove unnecessary check and dead code * Clean up comparison/tests based on feedback * Include invalid order value in exception * Check for NaN order in spline validation * Add whatsnew entry for bug fix * CLN: Make unit tests assert one error at a time * CLN: break test into distinct test case * PEP8 fix in test module * CLN: Test fixture for interpolate methods * BUG: DataFrame.join on tz-aware DatetimeIndex (pandas-dev#25260) * REF: use _constructor and ABCFoo to avoid runtime imports (pandas-dev#25272) * Refactor groupby group_prod, group_var, group_mean, group_ohlc (pandas-dev#25249) * Fix typo in Cheat sheet with regex (pandas-dev#25215) * Edit parameter type in pandas.core.frame.py DataFrame.count (pandas-dev#25198) * TST/CLN: remove test_slice_ints_with_floats_raises (pandas-dev#25277) * Removed Panel class from HDF ASVs (pandas-dev#25281) * DOC: Fix minor typo in docstring (pandas-dev#25285) * DOC/CLN: Fix errors in DataFrame docstrings (pandas-dev#24952) * Skipped broken Py2 / Windows test (pandas-dev#25323) * Rt05 documentation error fix issue 25108 (pandas-dev#25309) * Fix typos in docs (pandas-dev#25305) * Doc: corrects spelling in generic.py (pandas-dev#25333) * BUG: groupby.transform retains timezone information (pandas-dev#25264) * Fixes Formatting Exception (pandas-dev#25088) * Bug: OverflowError in resample.agg with tz data (pandas-dev#25297) * DOC/CLN: Fix various docstring errors (pandas-dev#25295) * COMPAT: alias .to_numpy() for timestamp and timedelta scalars (pandas-dev#25142) * ENH: Support times with timezones in at_time (pandas-dev#25280) * BUG: Fix passing of numeric_only argument for categorical reduce (pandas-dev#25304) * TST: use a fixed seed to have the same uniques across python versions (pandas-dev#25346) TST: add pytest-mock to handle mocker fixture * TST: xfail excel styler tests, xref GH25351 (pandas-dev#25352) * TST: xfail excel styler tests, xref GH25351 * CI: cleanup .c files for cpplint>1.4 * DOC: Correct doc mistake in combiner func (pandas-dev#25360) Closes pandas-devgh-25359. * DOC/BLD: fix --no-api option (pandas-dev#25209) * DOC: modify typos in Contributing section (pandas-dev#25365) * Remove spurious MultiIndex creation in `_set_axis_name` (pandas-dev#25371) * Resovles pandas-dev#25370 * Introduced by pandas-dev#22969 * pandas-dev#23049: test for Fatal Stack Overflow stemming From Misuse of astype('category') (pandas-dev#25366) * 9236: test for the DataFrame.groupby with MultiIndex having pd.NaT (pandas-dev#25310) * [BUG] exception handling of MultiIndex.__contains__ too narrow (pandas-dev#25268) * 14873: test for groupby.agg coercing booleans (pandas-dev#25327) * BUG/ENH: Timestamp.strptime (pandas-dev#25124) * BUG: constructor Timestamp.strptime() does not support %z. * Add doc string to NaT and Timestamp * updated the error message * Updated whatsnew entry. * Interval dtype fix (pandas-dev#25338) * [CLN] Excel Module Cleanups (pandas-dev#25275) Closes pandas-devgh-25153 Authored-By: tdamsma <tdamsma@gmail.com> * ENH: indexing and __getitem__ of dataframe and series accept zerodim integer np.array as int (pandas-dev#24924) * REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (pandas-dev#25282, pandas-dev#25317) (pandas-dev#25329) * edited whatsnew typo (pandas-dev#25381) * fix typo of see also in DataFrame stat funcs (pandas-dev#25388) * API: more consistent error message for MultiIndex.from_arrays (pandas-dev#25189) * CLN: (re-)enable infer_dtype to catch complex (pandas-dev#25382) * DOC: Edited docstring of Interval (pandas-dev#25410) The docstring contained a repeated segment, which I removed. * Mark test_pct_max_many_rows as high memory (pandas-dev#25400) Fixes issue pandas-dev#25384 * Correct a typo of version number for interpolate() (pandas-dev#25418) * DEP: add pytest-mock to environment.yml (pandas-dev#25417) * BUG: Fix type coercion in read_json orient='table' (pandas-dev#21345) (pandas-dev#25219) * ERR: doc update for ParsingError (pandas-dev#25414) Closes pandas-devgh-22881 * ENH: Add in sort keyword to DatetimeIndex.union (pandas-dev#25110) * DOC: Rewriting of ParserError doc + minor spacing (pandas-dev#25421) Follow-up to pandas-devgh-25414. * API/ERR: allow iterators in df.set_index & improve errors (pandas-dev#24984) * BUG: Indexing with UTC offset string no longer ignored (pandas-dev#25263) * PERF/REF: improve performance of Series.searchsorted, PandasArray.searchsorted, collect functionality (pandas-dev#22034) * TST: remove never-used singleton fixtures (pandas-dev#24885) * BUG: fixed merging with empty frame containing an Int64 column (pandas-dev#25183) (pandas-dev#25289) * DOC: fixed geo accessor example in extending.rst (pandas-dev#25420) I realised "lon" and "lat" had just been switched with "longitude" and "latitude" in the following code block. So I used those names here as well. * TST: numpy RuntimeWarning with Series.round() (pandas-dev#25432) * CI: add __init__.py to isort skip list (pandas-dev#25455) * DOC: CategoricalIndex doc string (pandas-dev#24852) * DataFrame.drop Raises KeyError definition (pandas-dev#25474) * BUG: Keep column level name in resample nunique (pandas-dev#25469) Closes pandas-devgh-23222 xref pandas-devgh-23645 * ERR: Correct error message in to_datetime (pandas-dev#25467) * ERR: Correct error message in to_datetime Closes pandas-devgh-23830 xref pandas-devgh-23969 * Fix minor typo (pandas-dev#25458) Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * CI: Set pytest minversion to 4.0.2 (pandas-dev#25402) * CI: Set pytest minversion to 4.0.2 * STY: use pytest.raises context manager (indexes) (pandas-dev#25447) * STY: use pytest.raises context manager (tests/test_*) (pandas-dev#25452) * STY: use pytest.raises context manager (tests/test_*) * fix ci failures * skip py2 ci failure * Fix minor error in dynamic load function (pandas-dev#25256) * Cythonized GroupBy Quantile (pandas-dev#20405) * BUG: Fix regression on DataFrame.replace for regex (pandas-dev#25266) * BUG: Fix regression on DataFrame.replace for regex The commit ensures that the replacement for regex is not confined to the beginning of the string but spans all the characters within. The behaviour is then consistent with versions prior to 0.24.0. One test has been added to account for character replacement when the character is not at the beginning of the string. * Correct contribution guide docbuild instruction (pandas-dev#25479) * TST/REF: Add pytest idiom to test_frequencies.py (pandas-dev#25430) * BUG: Fix index type casting in read_json with orient='table' and float index (pandas-dev#25433) (pandas-dev#25434) * BUG: Groupby.agg with reduction function with tz aware data (pandas-dev#25308) * BUG: Groupby.agg cannot reduce with tz aware data * Handle output always as UTC * Add whatsnew * isort and add another fixed groupby.first/last issue * bring condition at a higher level * Add try for _try_cast * Add comments * Don't pass the utc_dtype explicitly * Remove unused import * Use string dtype instead * DOC: Fix docstring for read_sql_table (pandas-dev#25465) * ENH: Add Series.str.casefold (pandas-dev#25419) * Fix PR10 error and Clean up docstrings from functions related to RT05 errors (pandas-dev#25132) * Fix unreliable test (pandas-dev#25496) * DOC: Clarifying doc/make.py --single parameter (pandas-dev#25482) * fix MacPython / pandas-wheels ci failures (pandas-dev#25505) * DOC: Reword Series.interpolate docstring for clarity (pandas-dev#25491) * Changed insertion order to sys.path (pandas-dev#25486) * TST: xfail non-writeable pytables tests with numpy 1.16x (pandas-dev#25517) * STY: use pytest.raises context manager (arithmetic, arrays, computati… (pandas-dev#25504) * BUG: Fix RecursionError during IntervalTree construction (pandas-dev#25498) * STY: use pytest.raises context manager (plotting, reductions, scalar...) (pandas-dev#25483) * STY: use pytest.raises context manager (plotting, reductions, scalar...) * revert removed testing in test_timedelta.py * remove TODO from test_frame.py * skip py2 ci failure * BUG: Fix potential segfault after pd.Categorical(pd.Series(...), categories=...) (pandas-dev#25368) * Make DataFrame.to_html output full content (pandas-dev#24841) * BUG-16807-1 SparseFrame fills with default_fill_value if data is None (pandas-dev#24842) Closes pandas-devgh-16807. * DOC: Add conda uninstall pandas to contributing guide (pandas-dev#25490) * fix pandas-dev#25487 add modify documentation * fix segfault when running with cython coverage enabled, xref cython#2879 (pandas-dev#25529) * TST: inline empty_frame = DataFrame({}) fixture (pandas-dev#24886) * DOC: Polishing typos out of doc/source/user_guide/indexing.rst (pandas-dev#25528) * STY: use pytest.raises context manager (frame) (pandas-dev#25516) * DOC: Fix pandas-dev#24268 by updating description for keep in Series.nlargest (pandas-dev#25358) * DOC: Fix pandas-dev#24268 by updating description for keep * fix MacPython / pandas-wheels ci failures (pandas-dev#25537) * TST/CLN: Remove more Panel tests (pandas-dev#25550) * BUG: caught typeError in series.at (pandas-dev#25506) (pandas-dev#25533) * ENH: Add errors parameter to DataFrame.rename (pandas-dev#25535) * ENH: GH13473 Add errors parameter to DataFrame.rename * TST: Skip IntervalTree construction overflow test on 32bit (pandas-dev#25558) * DOC: Small fixes to 0.24.2 whatsnew (pandas-dev#25559) * minor typo error (pandas-dev#25574) * BUG: in error message raised when invalid axis parameter (pandas-dev#25553) * BLD: Fixed pip install with no numpy (pandas-dev#25568) * Document the behavior of `axis=None` with `style.background_gradient` (pandas-dev#25551) * fix minor typos in dsintro.rst (pandas-dev#25579) * BUG: Handle readonly arrays in period_array (pandas-dev#25556) * BUG: Handle readonly arrays in period_array Closes pandas-dev#25403 * DOC: Fix typo in tz_localize (pandas-dev#25598) * BUG: secondary y axis could not be set to log scale (pandas-dev#25545) (pandas-dev#25586) * TST: add test for groupby on list of empty list (pandas-dev#25589) * TYPING: Small fixes to make stubgen happy (pandas-dev#25576) * CLN: Parmeterize test cases (pandas-dev#25355)

shoyer mentioned this issue Jan 27, 2019

ENH: Add sort parameter to set operations for some Indexes and adjust… #24521

Merged

4 tasks

gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves API Design Deprecate Functionality to remove in pandas labels Jan 27, 2019

gfyoung added this to the 0.24.1 milestone Jan 27, 2019

shoyer mentioned this issue Jan 27, 2019

Should xarray.align sort indexes in alignment? pydata/xarray#2719

Closed

TomAugspurger mentioned this issue Jan 27, 2019

RLS: 0.24.x #24949

Closed

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 27, 2019

REGR: Preserve order by default in Index.difference

d3214ae

Closes pandas-dev#24959

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 27, 2019

REGR: Preserve order by default in Index.difference

17d0f92

Closes pandas-dev#24959

TomAugspurger mentioned this issue Jan 27, 2019

REGR: Preserve order by default in Index.intersection #24967

Merged

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 29, 2019

[WIP]: API: Change default for Index.union sort

aac172c

Closes pandas-dev#24959

TomAugspurger mentioned this issue Jan 29, 2019

API: Change default for Index.union sort #25007

Closed

jorisvandenbossche closed this as completed in #24967 Jan 29, 2019

jorisvandenbossche pushed a commit that referenced this issue Jan 29, 2019

REGR: Preserve order by default in Index.difference (#24967)

ece58cb

Closes #24959

jorisvandenbossche reopened this Jan 29, 2019

jreback removed this from the 0.24.1 milestone Jan 30, 2019

TomAugspurger added this to the 0.24.1 milestone Jan 30, 2019

jorisvandenbossche mentioned this issue Feb 1, 2019

API: change Index set ops sort=True -> sort=None #25063

Merged

TomAugspurger closed this as completed in #25063 Feb 1, 2019

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

REGR: Preserve order by default in Index.difference (pandas-dev#24967)

e0c4b54

Closes pandas-dev#24959

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

REGR: Preserve order by default in Index.difference (pandas-dev#24967)

468cc20

Closes pandas-dev#24959

reidy-p mentioned this issue Apr 3, 2019

DEPR: Deprecate sort=None for union and implement sort=True #25980

Closed

4 tasks

AlexKirko mentioned this issue Jul 8, 2020

BUG: fix union_indexes not supporting sort=False for Index subclasses #35098

Merged

5 tasks

wence- mentioned this issue Nov 21, 2023

BUG: Discrepency between documentation and output for outer merge on index when left and right indices match and are unique #55992

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index.intersection changed behavior to sort by default in pandas 0.24 #24959

Index.intersection changed behavior to sort by default in pandas 0.24 #24959

shoyer commented Jan 27, 2019 •

edited

Loading

jreback commented Jan 27, 2019

shoyer commented Jan 27, 2019

jorisvandenbossche commented Jan 27, 2019

TomAugspurger commented Jan 27, 2019

TomAugspurger commented Jan 27, 2019

jorisvandenbossche commented Jan 28, 2019

shoyer commented Jan 28, 2019

TomAugspurger commented Jan 28, 2019 •

edited

Loading

jreback commented Jan 28, 2019

shoyer commented Jan 28, 2019

shoyer commented Jan 28, 2019

jreback commented Jan 28, 2019

jorisvandenbossche commented Jan 28, 2019

jreback commented Jan 28, 2019

TomAugspurger commented Jan 28, 2019

jorisvandenbossche commented Jan 28, 2019

TomAugspurger commented Jan 28, 2019

TomAugspurger commented Jan 28, 2019

jorisvandenbossche commented Jan 29, 2019

jreback commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

shoyer commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

jorisvandenbossche commented Jan 29, 2019

reidy-p commented Jan 29, 2019 •

edited

Loading

TomAugspurger commented Jan 30, 2019

TomAugspurger commented Jan 30, 2019

Index.intersection changed behavior to sort by default in pandas 0.24 #24959

Index.intersection changed behavior to sort by default in pandas 0.24 #24959

Comments

shoyer commented Jan 27, 2019 • edited Loading

jreback commented Jan 27, 2019

shoyer commented Jan 27, 2019

jorisvandenbossche commented Jan 27, 2019

TomAugspurger commented Jan 27, 2019

TomAugspurger commented Jan 27, 2019

jorisvandenbossche commented Jan 28, 2019

shoyer commented Jan 28, 2019

TomAugspurger commented Jan 28, 2019 • edited Loading

jreback commented Jan 28, 2019

shoyer commented Jan 28, 2019

shoyer commented Jan 28, 2019

jreback commented Jan 28, 2019

jorisvandenbossche commented Jan 28, 2019

jreback commented Jan 28, 2019

TomAugspurger commented Jan 28, 2019

jorisvandenbossche commented Jan 28, 2019

TomAugspurger commented Jan 28, 2019

TomAugspurger commented Jan 28, 2019

jorisvandenbossche commented Jan 29, 2019

jreback commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

shoyer commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

TomAugspurger commented Jan 29, 2019

jorisvandenbossche commented Jan 29, 2019

reidy-p commented Jan 29, 2019 • edited Loading

TomAugspurger commented Jan 30, 2019

TomAugspurger commented Jan 30, 2019

shoyer commented Jan 27, 2019 •

edited

Loading

TomAugspurger commented Jan 28, 2019 •

edited

Loading

reidy-p commented Jan 29, 2019 •

edited

Loading