Add `collapseHz()` #307

brownag · 2024-02-24T17:57:27Z

Here is another draft idea for a common operation people want to perform on SoilProfileCollections. This is a minimal implementation but I think several other options could greatly enhance the ways it could be used. It is likely it could be used to resolve some issues in soilDB such as ncss-tech/soilDB#120, ncss-tech/soilDB#122, ncss-tech/soilDB#135

collapseHz() function makes it easy to aggregate adjacent horizons (collapsing them into a single horizon) using groups based on pattern matching. Numeric properties are aggregated via weighted average, and other properties are aggregated via dominant condition (i.e. the aggregate value comes from thickest horizon in each group)

TODO/additional ideas

Custom matching function (not necessarily grepl() based, like thicknessOf() / Add thicknessOf() #306)
Custom numeric/categorical aggregation functions
- Ability to specify multiple summary statistics to be calculated per numeric column (e.g. range, or standard deviation, number of subhorizons, number of NA values, or similar)
- Concatenation and/or thickness tabulation of categories rather than dominant as the only option
Ability to exclude certain numeric properties from numeric aggregation (e.g. color value/chroma should not be averaged)
Integrate with generalized horizon labels (via GHL())--in lieu of specified pattern and possibly change default for hzdesgn when GHL() is set?

Example behavior

# testing collapseHz
library(aqp)
#> This is aqp 2.0.3
data(loafercreek, package = "soilDB")

x <- loafercreek[c(5,6,7,10,11),]

# collapse on hzdesgnname
a <- collapseHz(x, c(`A` = "^A", `BA` = "BA|Bw", `Bt` = "[ABC]+t", `Cr` = "Cr|R"))
profile_id(a) <- paste0(profile_id(a), "_collapse")
c(a, x) |> 
  plot(color = "clay")

# collapse on texture class groups / custom hzdesgnname
b <- collapseHz(x, 
                pattern = c(`l` = "^(l|sl)$", `c` = "^(cl|c|sicl|scl)$", `si` = '^si|sil$'),
                hzdesgn = "texcl")
profile_id(b) <- paste0(profile_id(b), "_collapse")
c(b, x) |> 
  plot(color = "texcl", name = "texcl")

dylanbeaudette · 2024-02-24T23:51:41Z

This is a great idea, and perfect extension to the GHL foundation of ideas and methods. I'm surprised that we never included it before.

From the texture class example above, it isn't clear to me how a dominant condition can be applied, when there is a (I think) full mapping of new labels to REGEX-matched patterns. Does the dominant condition come into play when an unmatched horizon label is encountered?

Good call on excluding color, perhaps a reasonable place to use simulated mixtures. Or, it could be that someone is only interested in Munsell value, and in that case a wt. mean is probably fine.

brownag · 2024-02-25T00:18:46Z

This is a great idea, and perfect extension to the GHL foundation of ideas and methods. I'm surprised that we never included it before.

Yeah, agreed, its something I feel like we have talked a lot about but just never have implemented a generic function for.

From the texture class example above, it isn't clear to me how a dominant condition can be applied, when there is a (I think) full mapping of new labels to REGEX-matched patterns. Does the dominant condition come into play when an unmatched horizon label is encountered?

Dominant condition is based on the thickest horizon's value within each "adjacency" group. There is no aggregation applied to an unmatched horizon label--it just passes through to the result. As currently implemented this is modifying the hzdesgn column in place, which could be the generalized horizon label column or in these examples is the hzdesgnname().

I need to ponder a bit on this as far as what the right ergonomics are. Probably better to have the function calculate a new horizon designation column and set it as the generalized horizon label, or horizon designation label, or both? Rather than modifying existing data--which might be important for downstream context.

Good call on excluding color, perhaps a reasonable place to use simulated mixtures. Or, it could be that someone is only interested in Munsell value, and in that case a wt. mean is probably fine.

Yeah, I think I want to keep the function relatively "dumb" but specific columns could be targeted with specific methods by either calling reduceSPC() on the input before hand, and then a custom aggregation function (TODO), or alternately an argument that allows exclusion of a vector of some column names from the transformation applied to the horizon data frame, or forces categorical aggregation

brownag · 2024-02-25T02:38:59Z

I must note that this concept is implemented in part as dissolve_hz() by @smroecker (

aqp/R/segment.R

Line 275 in fcad3e8

    
           dissolve_hz <- function(object, by, id = "peiid", hztop = "hzdept", hzbot = "hzdepb", collapse = FALSE, order = FALSE) {

) -- but that function does not operate on SoilProfileCollection objects.

I forgot about that function and reinvented part of the wheel here, I suppose. It is nice that dissolve_hz() can operate on several grouping variables, which is something that could be considered here as well, but it would behave differently (i.e. collapseHz() would alway use the interaction of the specified grouping variables to determine unique adjacent horizon groups, rather than return separate aggregations for each factor). dissolve_hz(collapse=TRUE) appears to use the intersection of grouping variables specified in by.

collapseHz() function intent seeks to be more generic (when TODOs are complete), and also does aggregation of properties across the horizon data.frame in one call, rather than just calculating depths.

dylanbeaudette · 2024-05-16T21:42:18Z

A note for later: one of the simplest cases is flattening on an intended horizon index, such as chkey or phiid, with a short-circut for 99% of the cases where no flattening is required.

brownag added 3 commits February 24, 2024 09:37

Add collapseHz()

ee45cf2

Add test

a5c22e3

fun

71de71c

brownag mentioned this pull request May 15, 2024

get_chorizon_from_SDA() returning duplicate horizons ncss-tech/soilDB#348

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `collapseHz()` #307

Add `collapseHz()` #307

brownag commented Feb 24, 2024 •

edited

Loading

dylanbeaudette commented Feb 24, 2024

brownag commented Feb 25, 2024

brownag commented Feb 25, 2024 •

edited

Loading

dylanbeaudette commented May 16, 2024

Add collapseHz() #307

Are you sure you want to change the base?

Add collapseHz() #307

Conversation

brownag commented Feb 24, 2024 • edited Loading

Example behavior

dylanbeaudette commented Feb 24, 2024

brownag commented Feb 25, 2024

brownag commented Feb 25, 2024 • edited Loading

dylanbeaudette commented May 16, 2024

Add `collapseHz()` #307

Add `collapseHz()` #307

brownag commented Feb 24, 2024 •

edited

Loading

brownag commented Feb 25, 2024 •

edited

Loading