-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_uid() returns incorrect UID for non-existent taxon #436
Comments
Hi @snubian looking at this. There's a few things here. NCBI doesn't do fuzzy searching as far as I know. So they don't attempt to match Fringella to the very close Fringilla, So in this case they did have a match for morel and returned those records. For the One approach with mis-spelled names (which you may be aware of already) is to try to make sure you have correct spellings first. e.g., using <r> gnr_resolve(names = "Fringella morel", data_source_ids = 4)
$results
[1] "no results found"
$preferred
NULL Then search for Fringella alone since no results above <r> gnr_resolve(names = "Fringella", data_source_ids = 4)
$results
submitted_name matched_name data_source_title score
1 Fringella Fringilla NCBI 0.5
$preferred
NULL |
Thanks @sckott - very helpful advice. I wasn't expecting NCBI to return a fuzzy match, but I thought it would only return a UID for an exact match or a synonym. After posting the above I had a quick check to see what was happening under the hood. The XML returned by the Entrez search:
So it's matched on the partial term I will definitely check names beforehand from now on, but are you aware of a way to force a search to match only on the full |
Tried adding a field to the term and
does not return the false-positive UID for While for a valid taxon:
it returns the correct UID. |
where did you find |
I don't see anything about searching on an entire term when more than one word, unfortunately, unless that's what |
…difier params, #436 changed filt() fxn to actually filter by given value instead of filter if present
@snubian I made some changes, see egs, added a few new params, and changed The filtering across all |
On the http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=taxonomy As you say, Thanks Scott for making these changes so quickly. I installed the updates, but not sure if I'm doing the right thing to use the new parameters. I can see from the examples the distinction between rank_query and rank_filter. But I still can't get it to return NA when given a bad taxon name - it is still wanting to return a partial match for anything else it can find. I tried using |
Also, I tried to use
returns no matches. |
Right, i'm aware that this still doesn't fix your problem, but these fixes I think should help in general. I hope we can in fact solve your problem, just no there yet an eg for using the <r> get_uid(sciname = "Fringilla", modifier = "Scientific Name")
Retrieving data for taxon 'Fringilla'
[1] "36254"
attr(,"class")
[1] "uid"
attr(,"match")
[1] "found"
attr(,"uri")
[1] "http://www.ncbi.nlm.nih.gov/taxonomy/36254" |
Thanks again Scott, these changes are a nice improvement for sure. I can work around my problem given your suggestions above. Your effort is really appreciated, don't get me wrong. My problem with
|
Thanks for further info. Getting back to this soon. |
Been looking over this today. I don't think there's anything else I can do besides making the documentation a bit more clear, letting users know their options of how to modify requests with modifiers and other arguments. And note that Entrez does funny things with fuzzy search, matching epithets alone to other unrelated taxa, etc. |
Using taxize 0.6.0
Have hit a couple of cases where searching on a scientific name which should return NA,
get_uid()
returns the UID of an unrelated taxon. It seems thatget_uid()
is returning a match based on only part of the search term, such as matching the species epithet to an unrelated genus. E.g.:The search term
Fringella morel
is a typo error of a bird species and does not actually exist, but the UID returned39407
is for taxonMorchella esculenta
, a fungus which has GenBank common namemorel
(in case that is relevant).If I alter the search term slightly I get the correct result:
I tried using the
division
parameter to narrow the search but without success.Another example is:
which returns the UID for aphid genus
Acuticauda
. I tried usingrank = "species"
here but had no effect.Note that when manually doing the above searches in the NCBI Taxonomy Browser you get the standard
Did you mean ...
response with a list of suggested taxa.The text was updated successfully, but these errors were encountered: