-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
occCite: Querying and Managing Large Biodiversity Occurrence Datasets #407
Comments
Hello @hannahlowens, your Handling Editor will be @karthik. He will get in touch soon with the initial checks. |
Editor checks:
Editor comments👋 @hannahlowens. Here are some general checks from the goodpractice package. As you resolve these issues, you can run I also noticed there are no tests. Please add test coverage for all your functions. Let us know if you need help or pointers to get started.
Reviewers: |
Hi Karthik! |
Hello Hannah. By tests, I mean explicit tests for each functions in the tests folder. Examples are great and can be turned into tests. What you have are more along the lines of informal tests. For an introduction to writing tests, this chapter from the R packages book https://r-pkgs.org/tests.html might be useful. |
👋 @poldham! Would you be able to serve as a reviewer for this submission? |
👋 @peterdesmet Would you be able to as a second reviewer on this submission? |
Hi @karthik, my colleague @damianooldoni and I are happy share second reviewer duties if that is ok with you. Should we wait until tests are implemented in the pkg? |
Hi @karthik et al.! Tests are now implemented in the package. |
Hi @peterdesmet. Thank you! I'm happy to have the two of you as reviewers (no need to share duties). Is that ok with you @damianooldoni? and thanks @hannahlowens! Perfect timing. |
I saw @melvidoni's tweet today, I am an ecologist who works on occurrence data, and I am relatively comfortable with R packages, so I would be happy to review the package if you think I can be a suitable reviewer! |
Similar to @KevCaz
I also saw @melvidoni's tweet and I'll be happy to review the package as I'm also a ecologist working with large biodiversity databases. |
yes, @karthik , it's ok with me. Are there some deadlines for this task? I work bad without them 😄 |
Excellent @damianooldoni! |
Hi @peterdesmet. @damianooldoni |
Sorry @karthik for being late. I start now and I will finish it before the end of this week. |
Thank you very much! |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
I have never had working relationship with package authors.
DocumentationThe package includes all the following forms of documentation:
Functionality
Estimated hours spent reviewing: 10
Review CommentsWell done! The package name fits the scope very well, it's catchy and easy to remember. Here below some comments to improve this package. GitHub repo webpageIn https://github.com/hannahlowens/occCite I read in the Aout section:
Why do you speak about points? You mean occurrences in general which are not points, although I understand that we conceive them as points in the spacetime. Just replace occurrence points with occurrences. READMESome issues related to rOpenSci guidelines:
Minor remarks:
rgbif::occ_search(taxonKey = 1, limit = 200000)
Error: Max offset of 100001 exceeded: 99900 + 300 Vignettes
Warning messages:
1: In utils::citation(pkg) :
no date field in DESCRIPTION file of package ‘occCite’
2: In utils::citation(pkg) :
no date field in DESCRIPTION file of package ‘occCite’
3: In utils::citation(pkg) :
no date field in DESCRIPTION file of package ‘occCite’
4: In utils::citation(pkg) :
no date field in DESCRIPTION file of package ‘occCite’
5: In utils::citation(pkg) :
no date field in DESCRIPTION file of package ‘occCite’
6: In utils::citation(pkg) :
no date field in DESCRIPTION file of package ‘occCite’
7: In utils::citation(pkg) :
no date field in DESCRIPTION file of package ‘occCite’ I think you should get this fixed: occCite is about citation after all. To get an idea, if I try the same commando with > utils::citation("rgbif")
Chamberlain S, Barve V, Mcglinn D, Oldoni D, Desmet P, Geffert L, Ram K (2021). _rgbif: Interface
to the Global Biodiversity Information Facility API_. R package version 3.5.2, <URL:
https://CRAN.R-project.org/package=rgbif>.
Chamberlain S, Boettiger C (2017). “R Python, and Ruby clients for GBIF species occurrence data.”
_PeerJ PrePrints_. <URL: https://doi.org/10.7287/peerj.preprints.3304v1>.
To see these entries in BibTeX format, use 'print(<citation>, bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'. This remark is related to the missing citation information in README mentioned above. Code stylePersonally I like tidyverse code style. For example, about comments style, I would add a space between the dash To automatize the restyling Please consider also to split strings longer than 80 characters. For example, L25 of warning("Input x is not of class 'occCiteData'. Input x must be result of a studyTaxonList() search.\n") FunctionsGBIFLoginManager()Basic but useful function! Nice. My only concern here is that this function silently(!) triggers a GBIF download of 109 occurrences every time it's called by a user, just to test if the credentials are correct. There are two problems doing this, both from user and GBIF side. As user I don't like to have my download page full of dummy downloads silently created every time I use this function. The triggering of downloads is also not mentioned in function documentation: I wonder what GBIF users will start to think when they notice such downloads: will they contact GBIF about it thinking about an error at GBIF side? 😱 In any case, I strongly suggest to remove the test. A strong reason to remove it is that the user still gets an error by > occ_download(user="user",
email = "email",
pwd = "pwd",
rgbif::pred("catalogNumber", 217880))
Error: 401 - Unauthorized I understand very well you want to apply defensive programming in your package! And that's great. However, as A typo: in function comment text at L66, replace getGBIFpoints()Same remark as for gitHub webpage above. I would replace I would also mention GBIF in the @description. A suggestion: #' @description Downloads GBIF occurrences and useful related information for
#' processing within other occCite functions I see that rgbif::pred("hasCoordinate", TRUE)
rgbif::pred("hasGeospatialIssue", FALSE) I suggest also to add a predicate to retrieve only presences. Not doing it can be very dangerous and a quite typical and naive error while processing GBIF data as the typical user is not concerned about occurrences related to absences. A more general remark/question: what if users want to add other filters or modify yours? If I am interested only in data from 2018 from Andorra, why should I get data from all the world from 1000 up to now? This is the classic dilemma while programming high level functions: the balance between being user friendly and flexibility. Personally I would allow users to make their own query by means of ellipsis In any case, even if you decide to not change your code, still you need to document your choice! The best way is to do it in Details section (@details) of the function documentation. Another remark streaming from the previous one: I see you limit your research on species as you search among species names only. At L41: key <- rgbif::name_suggest(q=cleanTaxon, rank='species')$data$key[1] I find this quite a limit: why don't you allow the user to specify the rank itself by adding an argument I would also suggest to add (at least) a Based on this, I would also change description of argument # taxon can be a species (default)
getGBIFpoints(taxon="Gadus morhua",
GBIFLogin = myGBIFLogin,
GBIFDownloadDirectory = NULL)
# taxon can be something else, e.g. a genus
getGBIFpoints(taxon="Gadus",
rank = "genus"
GBIFLogin = myGBIFLogin,
GBIFDownloadDirectory = NULL) Still speaking about documentation, I see you can pass a phylogeny object to Your package is actually quite taxonomic oriented: you MUST specify a taxon/phylogeny. But what if the user wants to download all data from a country or a polygon for example? This is something I do quite often. Again, take in mind the balance between flexibility and user friendiness. A typo: comment text at L34: replace prevGBIFdownload()This function is quite important to avoid downloading again and again same data: prevGBIFdownload(taxonKey = "No key", GBIFLogin) Why do you have to specify a default value to Let's say I had to trigger a download via rgbif instead of using your getGBIFpoints() function as I need to download all data of all species in a region instead of specifying a taxon/phylogeny. Well, it would be nice to allow me to still use the rest of the package functionalities, isn't? This can be another way to solve the dilemma between user friendly code and flexibility. How to do it? you can allow me to get my GBIF download by passing a GBIF download key or the GBIF DOI to `prevGBIFdownload(taxonKey, GBIFLogin, GBIFDownloadKey)` Not only, as the downloads are public, the use of GBIFLogin is not really needed if the GBIF download is provided! For example, you can download this download, with key 0252206-200613084148143 and DOI https://doi.org/10.15468/dl.bptja4 isn't? Again, this improves the function: up to now the user is limited to the most recent download of the same taxon/philogeny. But waht if the last download has been deleted or it is corrupted or is invalid for any other reason? getBIENpoints()Quite the same remarks as for getGBIFpoints() occCitation()I think you forgot to specify which function(s) you need to import from RPostgreSQL package. At L13 I see: #' @import RPostgreSQL tabulate.occResults()You wrote Why do you use occCiteMapAgain, is there any conflict with other packages for using All All other functionsI think a in-depth documentation check should be done throughout all other functions. TestsThe idea of unit-tests is to tests as much units of code as possible, and doing it fast. But tests should not rely or at least as less as possible on other packages or databases. Is there a reason to use I would just pass an existent GBIF download key where it's possible so that your code is more independent of other packages. Running So, be careful while using |
@karthik Should I expect another set of reviews, or should I go ahead and address these comments? If the latter, how is this generally accomplished? |
Hi @hannahlowens , sorry for the super long delay. I'm currently acting as Editor in Chief, and was just directed to this thread. I would say if you are still interested in completing this process to go ahead and respond to the single review. I'll take some time and try to familiarize myself with the package and act as a second reviewer. |
Hi Laura,
Ok, great! Thanks. I’ve addressed some of their comments, but have been a bit derailed due to some problems with CURL in Linux on CRAN. I will repsond more formally to the review once we’ve chased down the problem (hopefully).
Best,
Hannah
On Dec 14, 2021, at 3:00 AM, Laura DeCicco ***@***.******@***.***>> wrote:
Hi @hannahlowens<https://github.com/hannahlowens> , sorry for the super long delay. I'm currently acting as Editor in Chief, and was just directed to this thread.
I would say if you are still interested in completing this process to go ahead and respond to the single review. I'll take some time and try to familiarize myself with the package and act as a second reviewer.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#407 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACXKAF6GM75FANQQCKPVJTTUQ2QLXANCNFSM4TEZZ4XA>.
|
Review ResponseHi @ldecicco-USGS, After a long time (sincere apologies there), I have worked through Damiano's wonderfully thorough and helpful review, and I have added him as a reviewer in the package citation. Below, I will go through his comments, point-by-point. Regarding "points"At several places in the review, the reviewer suggests replacing "point(s)" with "occurrence(s)", since it's a bit more intuitive and specific. I have done this throughout the documentation, although argument and function names remain the same. There are already downstream dependencies that use the existing names. I will work with the developers of those packages to transition from "points" to "occurrences" as a natural part of our ongoing development collaboration. README
Vignettes
Code Style
Functions
|
Looks great! @damianooldoni let me know if you think the response to the review is satisfactory. |
Thanks @hannahlowens for appreciating my comments. I am happy you find them useful. |
Thanks very much, @hannahlowens and @damianooldoni! I'm back on this review. I'll approve it shortly. |
@ropensci-review-bot approve occCite |
Approved! Thanks @hannahlowens for submitting and @peterdesmet for your reviews! 😁 To-dos:
Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. She will get in touch about timing and can answer any questions. We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved. Last but not least, you can volunteer as a reviewer via filling a short form. |
👋 @hannahlowens Is any help needed with the repo transfer? |
Hi @maelle Thanks for checking in. I've been a little swamped, but the repo transfer is on my to-do list today. Listing it in Friday's news paper is a much-needed shove! I'll be in touch if I run into any snags. |
@ropensci-review-bot finalize transfer of ropensci/occCite |
I'm sorry human, I don't understand that. You can see what commands I support by typing:
|
@hannahlowens can you please try again with |
@ropensci-review-bot finalize transfer of occCite |
Transfer completed. The |
@maelle Thanks! Is it ok to leave my pkgdown workflow in place, or is it preferred that I switch to ROpenSci building and branding? If I can leave the website as is, I think I've successfully done everything else and will do a fresh release. |
@hannahlowens The rOpenSci docs infrastructure will build a pkgdown website in any case, so I'd say it's best to delete your workflow (unless you want the two sites, which a few packages have to e.g. have their institutional branding). |
@maelle I see, thanks! I've deleted my workflow. |
@maelle Sorry, one more question--I want to make sure everything is as clean as possible before tomorrow. I deleted my pkgdown workflow, but I see the pages-build-deployment workflow builds fine but fails to deploy. Why is this? Is this something I can fix? |
@hannahlowens it was because of the lower-case readme.md, now it's deployed: https://docs.ropensci.org/occCite/ 🎉 |
Thanks @hannahlowens and @maelle! |
Date accepted: 2022-03-11
Due date for @peterdesmet: 2020-12-21Submitting Author Name: Hannah L. Owens
Submitting Author Github Handle: @hannahlowens
Repository: https://github.com/hannahlowens/occCite
Version submitted: 0.4.0
Editor: @karthik
Reviewers: @peterdesmet
Reviewer 2: @damianooldoni
Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
occCite retrieves species occurrence data from several aggregator databases, processes them into a single data object, creates primary data provider citations based on the results.
Who is the target audience and what are scientific applications of this package?
The target audience is primarily biogeographers and other researchers interested in the geographic distributions of biodiversity. occCite aims to close the gap in the citation cycle between primary data providers and final research products efficiently, allowing researchers to meet best-practice standards for biodiversity dataset documentation without sacrificing time and resources to the demands of providing increasing levels of detail on their datasets.
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
Tools in R do exist that enable researchers to document sources during the data collection
process—rgbif (Chamberlain et al. 2020a) for GBIF and BIEN (Maitner 2020) for the Botanical
Information and Ecology Network (BIEN) are both examples of API-interface packages that
include valuable features for citing data harvested from their aggregator databases. However, these
packages serve a single aggregator database are designed for specific use cases
tailored to their databases, and uniting aggregator results into a single dataset brings its own set of
challenges. Multi-platform occurrence aggregators do exist—searches using spocc (Chamberlain 2019)
can return occurrence information from up to 6 aggregator databases—but the process of combining data from these aggregators in each query results in the loss of key metadata, particularly accession date, primary data source, and, in the case of GBIF, dataset DOIs. Finally, to our knowledge, there does not yet exist a set of R tools that will manage metadata from an occurrence search and translate it into formatted citations for primary data providers that include accession dates and DOIs.
Yes
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
JOSS Options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.MEE Options
Code of conduct
The text was updated successfully, but these errors were encountered: