Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gmean na.rm=TRUE is much slower than na.rm=FALSE #4849

Closed
jangorecki opened this issue Dec 15, 2020 · 1 comment · Fixed by #4851
Closed

gmean na.rm=TRUE is much slower than na.rm=FALSE #4849

jangorecki opened this issue Dec 15, 2020 · 1 comment · Fixed by #4851
Assignees
Labels
benchmark GForce issues relating to optimized grouping calculations (GForce) performance
Milestone

Comments

@jangorecki
Copy link
Member

jangorecki commented Dec 15, 2020

Timings below may look like obtained from single session but they were actually run in fresh session each, also in between there was sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'.

library(data.table) ## 1.13.5
setDTthreads(0L) ## 40
set.seed(108)
N = 1e9L
K = 1e2L
DT = list()
DT[["id3"]] = factor(sample(sprintf("id%010d",1:(N/K)), N, TRUE))
DT[["v3"]] =  round(runif(N,max=100),6)
setDT(DT)

system.time(naf <- DT[, .(v3=mean(v3)), by=id3, verbose=TRUE])
#Detected that j uses these columns: v3 
#Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
#5.615s elapsed (00:01:39 cpu) 
#Finding group sizes from the positions (can be avoided to save RAM) ... 0.091s elapsed (0.074s cpu) 
#Getting back original order ... forder.c received a vector type 'integer' length 10000000
#1.037s elapsed (2.888s cpu) 
#lapply optimization is on, j unchanged as 'list(mean(v3))'
#GForce optimized j to 'list(gmean(v3))'
#Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.319
#gforce assign high and low took 4.399
#This gsum took (narm=FALSE) ... gather took ... 2.107s
#2.322s
#gforce eval took 2.339
#8.738s elapsed (00:02:39 cpu) 
#
#   user  system elapsed 
#261.852  67.723  15.498 

system.time(nat <- DT[, .(v3=mean(v3, na.rm=TRUE)), by=id3, verbose=TRUE])
#Detected that j uses these columns: v3 
#Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
#5.799s elapsed (00:01:42 cpu) 
#Finding group sizes from the positions (can be avoided to save RAM) ... 0.090s elapsed (0.074s cpu) 
#Getting back original order ... forder.c received a vector type 'integer' length 10000000
#2.608s elapsed (3.275s cpu) 
#lapply optimization is on, j unchanged as 'list(mean(v3, na.rm = TRUE))'
#GForce optimized j to 'list(gmean(v3, na.rm = TRUE))'
#Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.346
#gforce assign high and low took 4.978
#gforce eval took 33.515
#40.2s elapsed (00:02:24 cpu) 
#
#   user  system elapsed 
#250.858  68.804  48.679

This is actually mentioned in #3202.

@jangorecki jangorecki added benchmark performance GForce issues relating to optimized grouping calculations (GForce) labels Dec 15, 2020
@jangorecki jangorecki assigned jangorecki and unassigned jangorecki Dec 15, 2020
@mattdowle mattdowle added this to the 1.13.7 milestone Dec 15, 2020
@jangorecki jangorecki self-assigned this Dec 16, 2020
@jangorecki
Copy link
Member Author

jangorecki commented Dec 17, 2020

Timings on #4851
na.rm=TRUE 48.6s down to 14.3
na.rm=FALSE 15.5 down to 14.7

> system.time(nat <- DT[, .(v3=mean(v3, na.rm=TRUE)), by=id3, verbose=TRUE])
Detected that j uses these columns: v3
Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
5.198s elapsed (00:01:35 cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.091s elapsed (0.075s cpu) 
Getting back original order ... forder.c received a vector type 'integer' length 10000000
0.479s elapsed (2.959s cpu) 
lapply optimization is on, j unchanged as 'list(mean(v3, na.rm = TRUE))'
GForce optimized j to 'list(gmean(v3, na.rm = TRUE))'
Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.321
gforce assign high and low took 4.868
This gmean took (narm=TRUE) ... gather took ... 2.068s
2.298s
gforce eval took 2.300
8.537s elapsed (00:02:43 cpu) 
   user  system elapsed 
262.668  63.634  14.322 

## drop caches in another session

> system.time(naf <- DT[, .(v3=mean(v3)), by=id3, verbose=TRUE])
Detected that j uses these columns: v3 
Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
6.565s elapsed (00:01:35 cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.093s e
lapsed (0.085s cpu) 
Getting back original order ... forder.c received a vector type 'integer' length
 10000000
0.601s elapsed (5.789s cpu) 
lapply optimization is on, j unchanged as 'list(mean(v3))'
GForce optimized j to 'list(gmean(v3))'
Making each group and running j (GForce TRUE) ... gforce initial population of g
rp took 0.314
gforce assign high and low took 4.880
This gmean took (narm=FALSE) ... gather took ... 1.717s
1.931s
gforce eval took 1.931
7.467s elapsed (00:02:39 cpu) 
   user  system elapsed 
261.039  61.257  14.738 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark GForce issues relating to optimized grouping calculations (GForce) performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants