Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort = FALSE does't generate equivalent results with data.frame.merge #2574

Closed
DavorJ opened this issue Jan 18, 2018 · 6 comments · Fixed by #4427
Closed

sort = FALSE does't generate equivalent results with data.frame.merge #2574

DavorJ opened this issue Jan 18, 2018 · 6 comments · Fixed by #4427
Assignees
Milestone

Comments

@DavorJ
Copy link

DavorJ commented Jan 18, 2018

Here is sample/test code:

A <- data.frame(a=1:4)
AB <- data.frame(a=2:4, b=1)
merge(A,AB,all.x = TRUE,sort=FALSE)
  a  b
1 2  1
2 3  1
3 4  1
4 1 NA

A <- data.table::data.table(a=1:4)
AB <- data.table::data.table(a=2:4, b=1)
merge(A,AB,all.x = TRUE,sort=FALSE)
   a  b
1: 1 NA
2: 2  1
3: 3  1
4: 4  1

This can have profound effects. Merge depends on type of A and data.frame/data.table are interchangeable in many scenario's. Changing type of A can end up causing fireworks at unexpected places.

I don't know what the correct behavior should be, but I do think the results should be the same.

@HughParsonage
Copy link
Member

base::merge says for sort=FALSE the rows are in 'an unspecified order'. So I think you'll just have to check the ordering if you need sort=FALSE.

@MichaelChirico
Copy link
Member

MichaelChirico commented Jan 18, 2018 via email

@DavorJ
Copy link
Author

DavorJ commented Jan 18, 2018

@MichaelChirico, natural, yes, but when I read the help for all.x:

logical; if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have 'NA's in those columns that are usually filled with values from y. The default is FALSE, so that only rows with data from both x and y are included in the output.

I read the 'added' as meaning 'added at the end', which is base::merge behavior.

It might be nitpicking, but it did cause me some headscratching.

@DavorJ DavorJ changed the title sort = FALSE does't generate equivalent results with data.frame.sort sort = FALSE does't generate equivalent results with data.frame.merge Jan 18, 2018
@arunsrinivasan
Copy link
Member

data.table's merge seems to give different orders for merge(A, AB, all.x=TRUE, sort=FALSE) vs merge(AB, A, all.y=TRUE, sort=FALSE). Seems like it should be fixed on data.table side -- less surprises in base R, I'd agree.

@jangorecki
Copy link
Member

jangorecki commented Apr 5, 2020

Looks to be related to #2594

@jangorecki
Copy link
Member

jangorecki commented Apr 22, 2020

@arunsrinivasan don't you think the current behavior is better? all.x=T means that we return all rows from LHS data, and at the same time we retain order of LHS data. IMO this is good, and this is how Y[X] behaves. If base::merge documents that sort=FALSE gives unspecified order, then there is nothing to be consistent about there.

@jangorecki jangorecki self-assigned this May 4, 2020
@mattdowle mattdowle added this to the 1.14.1 milestone Aug 2, 2021
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants