Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: merge_ordered fails with list-like left_by or right_by #38089

Merged
merged 9 commits into from
Nov 29, 2020
Merged

BUG: merge_ordered fails with list-like left_by or right_by #38089

merged 9 commits into from
Nov 29, 2020

Conversation

GYHHAHA
Copy link
Contributor

@GYHHAHA GYHHAHA commented Nov 26, 2020

@GYHHAHA GYHHAHA changed the title merge_ordered fails with list-like left_by BUG: merge_ordered fails with list-like left_by or right_by Nov 26, 2020
# GH 35269
left = DataFrame({"G": ["g", "g"], "H": ["h", "h"], "T": [1, 3]})
right = DataFrame({"T": [2], "E": [1]})
result = merge_ordered(left, right, on=["T"], left_by=["G", "H"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test the same with on='T'
also can you test with right_by

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 26, 2020
@pep8speaks
Copy link

pep8speaks commented Nov 27, 2020

Hello @GYHHAHA! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-29 09:11:53 UTC

@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Nov 27, 2020

There also exists two parts should be fixed.

>>>l = pd.DataFrame([['g', 'h', 1], ['g', 'h', 3]], columns=list('GHT'))
>>>l
   G  H  T
0  g  h  1
1  g  h  3
>>>r = pd.DataFrame([[2, 1]], columns=list('TE'))
>>>r
   T  E
0  2  1

First, the following unexpected left join is related to wrongly groupby .

>>>pd.merge_ordered(l, r, on='T', left_by=['G'])
G H T E
0 g h 1 nan
1 g h 3 nan

Second, unexpected result appears when unseen label exists in left_by . (also caused by try ... groupby part)

>>>pd.merge_ordered(l, r, on='T', left_by=['G', 'h'])
G H T E h
0 G h 1 nan G
1 G nan 2 1 G
2 h nan 2 1 h
3 h h 3 nan h

Since they are different bugs from the current list-like issue, thus I will make another PR for the above-mentioned two after this merged.

@GYHHAHA GYHHAHA requested a review from jreback November 27, 2020 05:21
# GH 35269
left = DataFrame({"G": ["g", "g"], "H": ["h", "h"], "T": [1, 3]})
right = DataFrame({"T": [2], "E": [1]})
result1 = merge_ordered(left, right, on=["T"], left_by=["G", "H"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize this test with the cases

@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Nov 29, 2020

failures from test_arithmetic, unrelated? @jreback

@jreback jreback added this to the 1.2 milestone Nov 29, 2020
@jreback jreback merged commit 7070aae into pandas-dev:master Nov 29, 2020
@jreback
Copy link
Contributor

jreback commented Nov 29, 2020

thanks @GYHHAHA very nice!

@GYHHAHA GYHHAHA deleted the fix-merge_ordered branch November 30, 2020 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: merge_ordered fails when left_by is set to more than one column
3 participants