-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data.table
re-write of slice()
#115
Comments
data.table
re-write of slice()
Reprex for library(aqp, warn=FALSE)
#> This is aqp 1.27
data(sp4)
sp4$name <- factor(sp4$name)
depths(sp4) <- id ~ top + bottom
# works
a <- slab(sp4, ~ name)
head(a)
#> variable all.profiles A A1 A2 AB ABt Bt Bt1 Bt2 Bt3 top bottom
#> 1 name 1 0.9 0.1 0.0 0.0 0.0 0.0 0.0 0 0 0 1
#> 2 name 1 0.9 0.1 0.0 0.0 0.0 0.0 0.0 0 0 1 2
#> 3 name 1 0.8 0.0 0.1 0.0 0.0 0.0 0.1 0 0 2 3
#> 4 name 1 0.4 0.0 0.1 0.0 0.1 0.1 0.3 0 0 3 4
#> 5 name 1 0.3 0.0 0.1 0.0 0.1 0.1 0.4 0 0 4 5
#> 6 name 1 0.3 0.0 0.0 0.1 0.1 0.1 0.4 0 0 5 6
#> contributing_fraction
#> 1 1
#> 2 1
#> 3 1
#> 4 1
#> 5 1
#> 6 1
# does not work
data(sp4)
sp4$name <- factor(sp4$name)
sp4 <- data.table::as.data.table(sp4)
depths(sp4) <- id ~ top + bottom
b <- slab(sp4, ~ name)
#> Error in h[, vars]: incorrect number of dimensions |
Yup,
This last round of S4SS has demonstrated the many ways in which {dplyr} can interfere with {aqp}. |
I think I would lean towards your option 1, based on recent experiences we all have had |
Me too. Hegemony for the tidyverse and all. Here is the evolving plan: finish |
functions are not yet exported, see R/dice.R for possibly subset of profiles or horizons when depths are invalid, and keeping track in metadata @brownag
Making progress, plenty of room to optimize:
10,000 profiles, no DT key or index:
10,000 profiles, DT index (no sorting):
10,000 profiles, DT key (sorting):
10,000 profiles, various arguments, latest improvements to
|
I'm going to implement this as an option in
|
This is in support of dice (#115) and backwards compatibility with slice. Conceptually, this function converts a ragged array to matrix by padding with NA.
Crap, when asking for a slice or range of slices that fall entirely within a horizon gap or horizon with bogus depths there is an error. This happens because the re-packed horizons are missing a profile ID that exists in site. data(sp4)
depths(sp4) <- id ~ top + bottom
sp4$top[1:2] <- NA
## this throws an error
d <- dice(sp4, fm = 5 ~ ., byhz = TRUE)
## this works, but a profile is dropped
d <- dice(sp4, fm = 5 ~ ., byhz = FALSE)
## this breaks at mapply step
d <- dice(sp4, fm = 5 ~ ., strict = FALSE)
# old slice, NA returned
s <- slice(sp4, fm = 5 ~ .)
profile_id(d) <- sprintf("%s-dice", profile_id(d))
z <- combine(s, d)
par(mar = c(0, 0, 0, 0))
plotSPC(z[1:8, ], color = 'Ca') |
With the addition of Also, |
Encountering DT warnings when
|
fillHzGaps() is not yet optimized, so this option is disabled by default.
No longer a problem, data(sp4)
depths(sp4) <- id ~ top + bottom
# corrupt 2nd horizon of colusa
sp4$top[2] <- NA
# remove invalid horizons
x <- HzDepthLogicSubset(sp4, byhz = TRUE)
# fill hz gaps
x <- fillHzGaps(x)
# works after gap-filling
d <- dice(x, fm = 0:15 ~ ., byhz = TRUE)
# old slice, NA returned
s <- slice(sp4, fm = 0:15 ~ .)
profile_id(d) <- sprintf("%s-dice", profile_id(d))
z <- combine(s, d)
par(mar = c(0, 0, 0, 0))
plotSPC(z[1:8, ], color = 'Ca') |
This is now addressed / tested by using |
Latest benchmarks using 10,000 random profiles.
Update 2022-08-23, using DT vs.
|
A Plan
dice
dice
slice
slice
max + 1 convention)k-index
(maybe)TODO
slice
in all aqp functions (this will take time, and aqp 2.0 is not backwards compatible)fm = 0:z ~ .
results inz+1
slices (bug related to filling to bottom of z-index, z +1)fillHzGaps
(default = FALSE), important forslab
andprofile_compare
fillHzGaps
, consider separate arguments to control filling / paddingvar1 + var2 + ...
findHzGaps()
,fillHzGaps()
function to find / fill gaps in horizon depths #205, pending optimization)data.table
(see warnings...):=
(see below)repairMissingHzDepths()
)@metadata
) / subset:byhz = TRUE
(should eventually do gap-filling with NA recordsfindHzGaps()
,fillHzGaps()
function to find / fill gaps in horizon depths #205)byhz = FALSE
integrate into(different PR / issue)slab()
(problems as of 2022-01-04), or integrate into slab-rewrite (better).Long-Term
setindex()
vs.setkey()
<-- this sorts the data (probably not worth the extra effort)Updates
slice
, maybe introduce now and phase-in by aqp 2.0.slice
in most cases. will attempt to change allslice
→dice
in the SPC introduction and see what happens. a messages is now printed when usingslice
, will be deprecated in aqp 2.0.See draft code for
dice()
. This is about 6x faster than the current implementation ofslice
, with a lower memory footprint as well.The text was updated successfully, but these errors were encountered: