Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnr_resolve - possibly nicer output #448

Closed
sckott opened this issue Aug 5, 2015 · 8 comments
Closed

gnr_resolve - possibly nicer output #448

sckott opened this issue Aug 5, 2015 · 8 comments
Assignees
Labels
Milestone

Comments

@sckott
Copy link
Contributor

sckott commented Aug 5, 2015

from a user:

I was trying to use gnr_resolve() to check a long list of plant species names for errors, and have it output a dataframe with one column containing the submitted names, and another column containing the best matches. I was having trouble subsetting the output from gnr_resolve to single best matches across all sources

take a peak

@Alectoria
Copy link

Not sure whether this is what you want but I have been using follow-up formatting with dplyr.

library(dplyr); library(taxize) 
taxa <-c("Scorsonera villosa","Abietinela", "Acer tatarica","Stipa johannis","Anthriscus nitidum", NA)

Slim version

    taxa %>%
      gnr_resolve(data_source_ids = c(1,5,12,165,167), stripauthority = T)%>%
      use_series(result)%>%
      select(submitted_name, matched_name2, score)%>%
      distinct()

    submitted_name      matched_name2 score
    1         Abietinela        Abietinella  0.50
    2         Abietinela         Abietineae  0.50
    3      Acer tatarica     Acer tataricum  0.75
    4 Anthriscus nitidum  Anthriscus nitida  0.75
    5                 NA                      NaN
    6 Scorsonera villosa Scorzonera villosa  0.75
    7     Stipa johannis      Stipa joannis  0.75

Compare with bulky version

   taxa %>%
    gnr_resolve(data_source_ids = c(1,5,12,165,167), stripauthority = T)

    $results
    submitted_name                    data_source_title score      matched_name2
    6          Abietinela                    Catalogue of Life  0.50        Abietinella
    7          Abietinela                                  EOL  0.50        Abietinella
    8          Abietinela                                  EOL  0.50        Abietinella
    9          Abietinela                                  EOL  0.50        Abietinella
    10         Abietinela                                  EOL  0.50        Abietinella
    11         Abietinela                                  EOL  0.50        Abietinella
    12         Abietinela Tropicos - Missouri Botanical Garden  0.50        Abietinella
    13         Abietinela Tropicos - Missouri Botanical Garden  0.50         Abietineae
    14      Acer tatarica                    Catalogue of Life  0.75     Acer tataricum
    15      Acer tatarica                                  EOL  0.75     Acer tataricum
    16      Acer tatarica                                  EOL  0.75     Acer tataricum
    17      Acer tatarica                                  EOL  0.75     Acer tataricum
    18      Acer tatarica                                  EOL  0.75     Acer tataricum
    19      Acer tatarica Tropicos - Missouri Botanical Garden  0.75     Acer tataricum
    20      Acer tatarica  The International Plant Names Index  0.75     Acer tataricum
    26 Anthriscus nitidum                    Catalogue of Life  0.75  Anthriscus nitida
    27 Anthriscus nitidum                                  EOL  0.75  Anthriscus nitida
    28 Anthriscus nitidum                                  EOL  0.75  Anthriscus nitida
    29 Anthriscus nitidum                                  EOL  0.75  Anthriscus nitida
    30 Anthriscus nitidum Tropicos - Missouri Botanical Garden  0.75  Anthriscus nitida
    31 Anthriscus nitidum  The International Plant Names Index  0.75  Anthriscus nitida
    32                 NA                                        NaN                   
    1  Scorsonera villosa                    Catalogue of Life  0.75 Scorzonera villosa
    2  Scorsonera villosa                                  EOL  0.75 Scorzonera villosa
    3  Scorsonera villosa                                  EOL  0.75 Scorzonera villosa
    4  Scorsonera villosa  The International Plant Names Index  0.75 Scorzonera villosa
    5  Scorsonera villosa  The International Plant Names Index  0.75 Scorzonera villosa
    21     Stipa johannis                    Catalogue of Life  0.75      Stipa joannis
    22     Stipa johannis                                  EOL  0.75      Stipa joannis
    23     Stipa johannis                                  EOL  0.75      Stipa joannis
    24     Stipa johannis Tropicos - Missouri Botanical Garden  0.75      Stipa joannis
    25     Stipa johannis  The International Plant Names Index  0.75      Stipa joannis

    $preferred
    NULL

@eduardszoecs
Copy link
Member

Returning only unique rows is definitively a good idea - but I would keep data_source_title!
Moreover, the documentation need to be updated (a list of three is returned, not a data.frame).

@eduardszoecs
Copy link
Member

#452 fixes the issue with uniques, when stripauthority = TRUE.

@sckott I have a suggestion: remove the other two list items:
If preferred_data_sources is given return out_preferred as result.
And append the not_known taxa to the result data.frame - what do you think?

@Alectoria
Copy link

minor comment: when using stripauthority = T beware of issue #451

sckott added a commit that referenced this issue Aug 19, 2015
@sckott
Copy link
Contributor Author

sckott commented Aug 19, 2015

Moreover, the documentation need to be updated (a list of three is returned, not a data.frame).

fixed that

@sckott
Copy link
Contributor Author

sckott commented Aug 19, 2015

I have a suggestion: remove the other two list items

@Edild hmmmm...I'll have a look at the results in different scenarios. I'm open to this, but want to make sure this is definitely better

one thought: when preferred data sources are requested, the API still returns results against all data sources - i imagine users may also want that - so returning on preferred data when preferred param is used may not be ideal

not_known is just a character vector, so we could add that as an attribute on a data.frame or list, but it wouldn't be very obvious then, but we could make it clear in the documentation

@eduardszoecs
Copy link
Member

But if the users specifiy preferred_data_sources then I think they also expects that only those preffered are returned.
I'm not that much in gnr_* so let you decide...

@sckott
Copy link
Contributor Author

sckott commented Aug 19, 2015

But if the users specify preferred_data_sources then I think they also expects that only those preffered are returned.

I agree that makes sense. Perhaps that's what we'll do

@sckott sckott closed this as completed in 0705081 Aug 24, 2015
@sckott sckott added this to the v0.6.3 milestone Aug 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants