Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

by = .EACHI key fix 4603 #4917

Merged
merged 12 commits into from
Apr 13, 2021
Merged

by = .EACHI key fix 4603 #4917

merged 12 commits into from
Apr 13, 2021

Conversation

ColeMiller1
Copy link
Contributor

@ColeMiller1 ColeMiller1 commented Feb 22, 2021

Closes #4603.
Closes #4911.

During by = .EACHI, the sorted attribute could be incorrectly assigned. Because the on = .(lhs = rhs), the order of the sorted attributes may not match the order of the by columns.

A second issue is that during normal by = operations, each column is keyed. However, since not all columns in Y of X[Y] are not guaranteed to be keyed, we cannot assume that all the columns are keyed, per existing behavior:

setattr(ans,"sorted",names(ans)[seq_along(grpcols)])

@codecov
Copy link

codecov bot commented Feb 22, 2021

Codecov Report

Merging #4917 (95e4c95) into master (ec1259a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 95e4c95 differs from pull request most recent head 81e13ec. Consider uploading reports for the commit 81e13ec to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4917   +/-   ##
=======================================
  Coverage   99.42%   99.42%           
=======================================
  Files          73       73           
  Lines       14429    14430    +1     
=======================================
+ Hits        14346    14347    +1     
  Misses         83       83           
Impacted Files Coverage Δ
R/data.table.R 99.94% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec1259a...81e13ec. Read the comment docs.

@ColeMiller1 ColeMiller1 removed the WIP label Feb 23, 2021
@ColeMiller1 ColeMiller1 changed the title Non equi key fix 4603 by = .EACHI key fix 4603 Feb 23, 2021
@ColeMiller1 ColeMiller1 added this to the 1.14.1 milestone Feb 23, 2021
@@ -1385,7 +1385,8 @@ replace_dot_alias = function(e) {
byval = i
bynames = if (missing(on)) head(key(x),length(leftcols)) else names(on)
allbyvars = NULL
bysameorder = haskey(i) || (is.sorted(f__) && ((roll == FALSE) || length(f__) == 1L)) # Fix for #1010
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the length(f__)==1L too. That was there, iiuc, from before #3443 when is.sorted() returned false for a single NA. Since is.sorted() is true for all length 1 input, the || length(f__)==1L is superfluous now.
Also moved the roll=FALSE first before && is.sorted(f__) to save the call to is.sorted() when roll==TRUE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chmatch not match too. i can fix if you're AFK now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good spot. done.

Copy link
Member

@mattdowle mattdowle Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I thought we had a isLeadingSubsetOfKey() helper function, or similar name, as we do that op quite a bit. But when I did a few greps I couldn't find anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was exactly thinking that we should just have a helper for this by now...

R/data.table.R Outdated
@@ -1385,7 +1385,8 @@ replace_dot_alias = function(e) {
byval = i
bynames = if (missing(on)) head(key(x),length(leftcols)) else names(on)
allbyvars = NULL
bysameorder = haskey(i) || (is.sorted(f__) && ((roll == FALSE) || length(f__) == 1L)) # Fix for #1010
bysameorder = (haskey(i) && identical(leftcols, head(match(key(i),names(i)), length(leftcols)))) || # leftcols leading subset of key(i); see #4917
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for identical() over all( . == . ) here? identical is a heavier test including things like attributes right?

also at a glance I would do head() before match()

Copy link
Member

@mattdowle mattdowle Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. == is an alloc returning logical vector which all() then passes over. It's the alloc that's nice to avoid; usually fast but could trigger a gc. identical is a binary compare so, when there aren't any attributes to check, it's faster than all(.==.), iirc some tests from years back posted somewhere where identical was fastest. But could be wrong and would be surprised if there's much in it in this small case. Then there's call overhead of .Internal vs .Primitive vs S3-method of the various options (identical vs all vs all.equal).

Yep head before match good idea. Will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caveat ... == for a length 1 vector is a special case that doesn't allocate, just for the reason that R has global internal constants for TRUE, FALSE, and NA_LOGICAL. So the result of x==1L when x is length 1L will be one of those global pointers without an alloc. As always, IIUC.

@mattdowle mattdowle merged commit 27c3bd8 into master Apr 13, 2021
@mattdowle mattdowle deleted the non_equi_key_fix_4603 branch April 13, 2021 16:19
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants