Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA should be interpreted as FALSE in .diagHzLongtoWide() #59

Closed
dylanbeaudette opened this issue Feb 23, 2018 · 4 comments
Closed

NA should be interpreted as FALSE in .diagHzLongtoWide() #59

dylanbeaudette opened this issue Feb 23, 2018 · 4 comments
Labels
NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions

Comments

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Feb 23, 2018

It appears that .diagHzLongtoWide() assigns NA to all columns when a pedon has no diagnostic features. NA should probably be FALSE, since we interpret NULL records as missing for all other interpretation of diagnostic feature records.

Probably related to joining a limited set of peiid records to the full set in @site.

@dylanbeaudette
Copy link
Member Author

Possible solution:

  • left join full set of peiid to extended_data$diagHzBoolean
  • replace NA with FALSE
  • join into @site

@brownag brownag added the NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions label Jan 16, 2021
@brownag
Copy link
Member

brownag commented Feb 17, 2021

Grave digging a bit here while doing some fetchNASIS work for #149.

As Dylan pointed out NA are introduced from join of a null right-hand side (no diagnostic records for particular peiid) to the pedon level where all peiid are present.

I think I would shy away from filling NA as FALSE as proposed above because of how spotty the diagnostics table can be depending on data vintage/origin/purpose.

If folks agree with my interpretations on following points, I think this issue can be closed.

  1. For "validation" purposes it may be valuable to know which pedon diagnostics were NULL (converted to NA) versus pedons where some diagnostics were populated just not that type (FALSE).

    • My work on diagnostic horizon heuristic methods has been centered around the significant need for filling gaps in a standardized way (to answer more "global" questions about e.g. taxon criteria) and also performing basic crosschecks on manually classified/entered data. Saying diagnostics aren't present because none are populated is not ideal.
  2. Any calculated values are an "improvement" over nothing, but the interpretation of those calculated values can and should vary.

    • No manually-entered values means you have to "trust" the computer's interpretation and have no direct way to cross check estimated presence/absence/boundaries without inferring from taxonomic relationships/correlations. Which might be fine, and it might not, again depending on the data. Ergo retaining a difference between NULL input v.s. definitive FALSE seems wise.
  3. Filtering around NA is covered by methods like subset,SoilProfileCollection-method

As an aside, @smroecker had proposed a refactor of this function in #158. That might be something to strongly consider for all "extended" data sources in near future / following merge of #149

@jskovlin
Copy link
Member

I would agree with your comments, Andrew. I think it is good to be able to make the distinction between NULL input vs. definitive FALSE. We will need to retain that kind of information for future gap filling diagnostics and workflows.

@dylanbeaudette
Copy link
Member Author

Works for me. Another line of reasoning: all pedons should have at least 1 diagnostic horizon / feature record, the complete absence suggests a data population error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions
Projects
None yet
Development

No branches or pull requests

3 participants