NA should be interpreted as FALSE in .diagHzLongtoWide() #59

dylanbeaudette · 2018-02-23T18:00:27Z

It appears that .diagHzLongtoWide() assigns NA to all columns when a pedon has no diagnostic features. NA should probably be FALSE, since we interpret NULL records as missing for all other interpretation of diagnostic feature records.

Probably related to joining a limited set of peiid records to the full set in @site.

The text was updated successfully, but these errors were encountered:

dylanbeaudette · 2018-02-23T18:19:38Z

Possible solution:

left join full set of peiid to extended_data$diagHzBoolean
replace NA with FALSE
join into @site

brownag · 2021-02-17T22:24:26Z

Grave digging a bit here while doing some fetchNASIS work for #149.

As Dylan pointed out NA are introduced from join of a null right-hand side (no diagnostic records for particular peiid) to the pedon level where all peiid are present.

I think I would shy away from filling NA as FALSE as proposed above because of how spotty the diagnostics table can be depending on data vintage/origin/purpose.

If folks agree with my interpretations on following points, I think this issue can be closed.

For "validation" purposes it may be valuable to know which pedon diagnostics were NULL (converted to NA) versus pedons where some diagnostics were populated just not that type (FALSE).
- My work on diagnostic horizon heuristic methods has been centered around the significant need for filling gaps in a standardized way (to answer more "global" questions about e.g. taxon criteria) and also performing basic crosschecks on manually classified/entered data. Saying diagnostics aren't present because none are populated is not ideal.
Any calculated values are an "improvement" over nothing, but the interpretation of those calculated values can and should vary.
- No manually-entered values means you have to "trust" the computer's interpretation and have no direct way to cross check estimated presence/absence/boundaries without inferring from taxonomic relationships/correlations. Which might be fine, and it might not, again depending on the data. Ergo retaining a difference between NULL input v.s. definitive FALSE seems wise.
Filtering around NA is covered by methods like subset,SoilProfileCollection-method

As an aside, @smroecker had proposed a refactor of this function in #158. That might be something to strongly consider for all "extended" data sources in near future / following merge of #149

jskovlin · 2021-02-17T22:38:09Z

I would agree with your comments, Andrew. I think it is good to be able to make the distinction between NULL input vs. definitive FALSE. We will need to retain that kind of information for future gap filling diagnostics and workflows.

dylanbeaudette · 2021-02-18T18:06:02Z

Works for me. Another line of reasoning: all pedons should have at least 1 diagnostic horizon / feature record, the complete absence suggests a data population error.

dylanbeaudette added a commit that referenced this issue Feb 23, 2018

undoing edit, and making note of #59

cd762e6

brownag added the NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions label Jan 16, 2021

dylanbeaudette closed this as completed Feb 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NA should be interpreted as FALSE in .diagHzLongtoWide() #59

NA should be interpreted as FALSE in .diagHzLongtoWide() #59

dylanbeaudette commented Feb 23, 2018 •

edited

Loading

dylanbeaudette commented Feb 23, 2018

brownag commented Feb 17, 2021

jskovlin commented Feb 17, 2021

dylanbeaudette commented Feb 18, 2021

NA should be interpreted as FALSE in .diagHzLongtoWide() #59

NA should be interpreted as FALSE in .diagHzLongtoWide() #59

Comments

dylanbeaudette commented Feb 23, 2018 • edited Loading

dylanbeaudette commented Feb 23, 2018

brownag commented Feb 17, 2021

jskovlin commented Feb 17, 2021

dylanbeaudette commented Feb 18, 2021

dylanbeaudette commented Feb 23, 2018 •

edited

Loading