Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutate_profile: dynamic column and expression upgrades #315

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

brownag
Copy link
Member

@brownag brownag commented May 18, 2024

mutate_profile() is a safe and efficient method for doing profile level calculations using expressions involving SPC columns to produce or modify site or horizon level columns. However, it can be difficult to "program" with because of the evaluation of ... where each argument becomes a new column. Sometimes the column names need to be calculated from data, and sometimes we want to reuse an expression on several columns.

This PR is a proof of concept that adds mutate_profile_raw() which can be used if building sets of dynamic mutate expressions. Also mutate_profile() gains col_names argument for dynamic naming of columns.

Also adds examples to the mutate_profile() docs

Thanks to @natearoe for discussion on this topic inspiring these upgrades.

I still need to think about the ergonomics.

Another possible approach would be to bring {rlang} dependency back in and use tidy data masking or tidy selection, allowing for use of verbs like across() or starts_with() to identify input variables. Even if that were to be implemented, there would still likely be value in having an approach like mutate_profile_raw()

Example:

library(aqp)
#> This is aqp 2.0.3
data(jacobs2000)

set.seed(123)

# example with a dynamically named column name
x <- mutate_profile(jacobs2000, bottom - top, 
                    col_names = paste0("thk", floor(runif(1, 0, 100))))
x$thk28
#>  [1] 18 25 36 51 23  3 57 18 28 38 38 23 68 15 10 39 20 28 53 10 20 33 26 51 35
#> [26] 20 18 28 33 48 26 48 18 28 30 28 15 49 15 26  7 13 30 41  8 12

# example with dynamic number of columns and names
master_desgn <- c("O", "A", "E", "B", "C", "R", "L", "M")
thk_names <- paste0("thk_", master_desgn)

x$thk <- x$bottom - x$top

## construct an arbitrary number of expressions using variable inputs
ops <- lapply(master_desgn, function(x) substitute(sum(thk[grepl(VAR, name)], na.rm = TRUE), list(VAR = x)))
names(ops) <- thk_names

# do mutation
y <- mutate_profile_raw(x, ops)

site(y)[c(idname(y), thk_names)]
#>     id thk_O thk_A thk_E thk_B thk_C thk_R thk_L thk_M
#> 1 92-1     0    18    25   113    60     0     0     0
#> 2 92-2     0    18    28    99    68     0     0     0
#> 3 92-3     0    25    49   111     0     0     0     0
#> 4 92-4     0    20     0     0   183     0     0     0
#> 5 92-5     0    28    81    26    48     0     0     0
#> 6 92-6     0    46    86    64     0     0     0     0
#> 7 92-7     0    15    97    28    12     0     0     0

…ic mutate expressions

 - `mutate_profile()` gains `col_names` argument for dynamic naming of columns, and
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant