Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: right merge not preserve row order (#27453) #27762

Closed
Closed
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,7 @@ New repr for :class:`~pandas.arrays.IntervalArray`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- :class:`pandas.arrays.IntervalArray` adopts a new ``__repr__`` in accordance with other array classes (:issue:`25022`)
- :class:`pandas.core.arrays.IntervalArray` adopts a new ``__repr__`` in accordance with other array classes (:issue:`25022`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a rebasing dupe


*pandas 0.25.x*

Expand All @@ -292,6 +293,32 @@ New repr for :class:`~pandas.arrays.IntervalArray`

pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])

- :meth:`DataFrame.merge` now preserves right frame's row order when executing a right merge (:issue:`27453`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a sub-section (need a line under the title), and put a shorter title, move what you have to the first sentence.


.. ipython:: python

left_df = pd.DataFrame({"colors": ["blue", "red"]}, index=pd.Index([0, 1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show these as well, call then just left and rigth

right_df = pd.DataFrame({"hats": ["small", "big"]}, index=pd.Index([1, 0]))

*pandas 0.25.x*

.. ipython:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be a code-block

left_df.merge(right_df, left_index=True, right_index=True, how="right")
colors hats
0 blue big
1 red small


*pandas 1.0.0*

.. ipython:: python

left_df.merge(right_df, left_index=True, right_index=True, how="right")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let this execute (don't put the results in)

colors hats
1 red small
0 blue big



All :class:`SeriesGroupBy` aggregation methods now respect the ``observed`` keyword
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -949,6 +976,7 @@ Reshaping
- Bug in :func:`melt` where supplying mixed strings and numeric values for ``id_vars`` or ``value_vars`` would incorrectly raise a ``ValueError`` (:issue:`29718`)
- Dtypes are now preserved when transposing a ``DataFrame`` where each column is the same extension dtype (:issue:`30091`)
- Bug in :func:`merge_asof` merging on a tz-aware ``left_index`` and ``right_on`` a tz-aware column (:issue:`29864`)
- :meth:`DataFrame.merge` now preserves right frame's row order when executing a right merge (:issue:`27453`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't put this here, you already have a sub-section

-

Sparse
Expand Down
20 changes: 14 additions & 6 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -566,10 +566,10 @@ def __init__(
indicator: bool = False,
validate=None,
):
_left = _validate_operand(left)
_right = _validate_operand(right)
self.left = self.orig_left = _left
self.right = self.orig_right = _right
left = validate_operand(left)
right = validate_operand(right)
self.left = self.orig_left = left
self.right = self.orig_right = right
self.how = how
self.axis = axis

Expand Down Expand Up @@ -1301,6 +1301,9 @@ def _get_join_indexers(
right_keys
), "left_key and right_keys must be the same length"

# bind `sort` arg. of _factorize_keys
fkeys = partial(_factorize_keys, sort=sort)

# get left & right join labels and num. of levels at each location
mapped = (
_factorize_keys(left_keys[n], right_keys[n], sort=sort)
Expand All @@ -1315,15 +1318,20 @@ def _get_join_indexers(
# factorize keys to a dense i8 space
# `count` is the num. of unique keys
# set(lkey) | set(rkey) == range(count)
lkey, rkey, count = _factorize_keys(lkey, rkey, sort=sort)

# flip left and right keys if performing a right merge
# to preserve right merge row order (GH 27453)
if how == "right":
bongolegend marked this conversation as resolved.
Show resolved Hide resolved
factorized_rkey, factorized_lkey, count = fkeys(rkey, lkey)
else:
factorized_lkey, factorized_rkey, count = fkeys(lkey, rkey)
# preserve left frame order if how == 'left' and sort == False
kwargs = copy.copy(kwargs)
if how == "left":
kwargs["sort"] = sort
join_func = _join_functions[how]

return join_func(lkey, rkey, count, **kwargs)
return join_func(factorized_lkey, factorized_rkey, count, **kwargs)


def _restore_dropped_levels_multijoin(
Expand Down
48 changes: 40 additions & 8 deletions pandas/tests/reshape/merge/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1289,17 +1289,17 @@ def test_merge_on_index_with_more_values(self, how, index, expected_index):
# GH 24212
# pd.merge gets [0, 1, 2, -1, -1, -1] as left_indexer, ensure that
# -1 is interpreted as a missing value instead of the last element
df1 = pd.DataFrame({"a": [1, 2, 3], "key": [0, 2, 2]}, index=index)
df2 = pd.DataFrame({"b": [1, 2, 3, 4, 5]})
df1 = pd.DataFrame({"a": [0, 1, 2], "key": [0, 1, 2]}, index=index)
bongolegend marked this conversation as resolved.
Show resolved Hide resolved
df2 = pd.DataFrame({"b": [0, 1, 2, 3, 4, 5]})
result = df1.merge(df2, left_on="key", right_index=True, how=how)
expected = pd.DataFrame(
[
[1.0, 0, 1],
[2.0, 2, 3],
[3.0, 2, 3],
[np.nan, 1, 2],
[np.nan, 3, 4],
[np.nan, 4, 5],
[0, 0, 0],
[1, 1, 1],
jreback marked this conversation as resolved.
Show resolved Hide resolved
[2, 2, 2],
[np.nan, 3, 3],
[np.nan, 4, 4],
[np.nan, 5, 5],
],
columns=["a", "key", "b"],
)
Expand Down Expand Up @@ -2153,3 +2153,35 @@ def test_merge_multiindex_columns():
expected["id"] = ""

tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("how", ["left", "right"])
def test_merge_preserves_row_order(how):
# GH 27453
population = [
("Jenn", "Jamaica", 3),
("Beth", "Bulgaria", 7),
("Carl", "Canada", 30),
]
columns = ["name", "country", "population"]
population_df = DataFrame(population, columns=columns)

people = [("Abe", "America"), ("Beth", "Bulgaria"), ("Carl", "Canada")]
columns = ["name", "country"]
people_df = DataFrame(people, columns=columns)

expected_data = [
("Abe", "America", np.nan),
("Beth", "Bulgaria", 7),
("Carl", "Canada", 30),
]
expected_cols = ["name", "country", "population"]
expected = DataFrame(expected_data, columns=expected_cols)

if how == "right":
left_df, right_df = population_df, people_df
elif how == "left":
left_df, right_df = people_df, population_df

result = left_df.merge(right_df, on=("name", "country"), how=how)
assert_frame_equal(expected, result)