Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

occCite: Querying and Managing Large Biodiversity Occurrence Datasets #407

Closed
13 of 31 tasks
hannahlowens opened this issue Oct 30, 2020 · 39 comments
Closed
13 of 31 tasks

Comments

@hannahlowens
Copy link

hannahlowens commented Oct 30, 2020

Date accepted: 2022-03-11
Submitting Author Name: Hannah L. Owens
Submitting Author Github Handle: @hannahlowens
Repository: https://github.com/hannahlowens/occCite
Version submitted: 0.4.0
Editor: @karthik
Reviewers: @peterdesmet

Due date for @peterdesmet: 2020-12-21

Reviewer 2: @damianooldoni
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: occCite
Type: Package
Title: Querying and Managing Large Biodiversity Occurrence Datasets
Version: 0.4.0
Authors@R: c(
  person(given = "Hannah L.", family = "Owens", ,"hannah.owens@gmail.com", role=c("aut","cre"),
    comment = c(ORCID = "0000-0003-0071-1745")),
  person(given = "Cory", family = "Merow",,,role="aut",
    comment = c(ORCID = "0000-0003-0561-053X")),
  person(given = "Brian", family = "Maitner", ,,,role="aut",
    comment = c(ORCID = "0000-0002-2118-9880")),
  person(given = "Jamie M.", family = "Kass", ,,,role="aut",
    comment = c(ORCID = "0000-0002-9432-895X")),
  person(given = "Vijay", family = "Barve",,"vijay.barve@gmail.com ",role="aut",
    comment = c(ORCID = "0000-0002-4852-2567")),
  person(given = "Robert P.", family = "Guralnick",,,role="aut",
    comment = c(ORCID = "0000-0001-6682-1504"))
  )
Author: Hannah L. Owens [aut, cre] (<https://orcid.org/0000-0003-0071-1745>), Cory Merow [aut] (<https://orcid.org/0000-0003-0561-053X>), Brian Maitner [aut] (<https://orcid.org/0000-0002-2118-9880>), Jamie M. Kass [aut] (<https://orcid.org/0000-0002-9432-895X>), Vijay Barve [aut] (<https://orcid.org/0000-0002-4852-2567>), Robert P. Guralnick [aut] (<https://orcid.org/0000-0001-6682-1504>)
Maintainer: Hannah L. Owens <hannah.owens@gmail.com>
Description: Facilitates the gathering of biodiversity occurrence data 
  from disparate sources. Metadata is managed throughout the process to facilitate 
  reporting and enhanced ability to repeat analyses.
License: GPL (>= 2)
URL: https://github.com/hannahlowens/occCite
BugReports: https://github.com/hannahlowens/occCite/issues
Encoding: UTF-8
LazyData: true
Language: en-US
Depends: R (>= 3.5.0)
Suggests:
  rmarkdown,
  RColorBrewer,
  viridis,
  remotes,
  RefManageR
Imports:
    bib2df,
    BIEN,
    bit64,
    dplyr,
    ape,
    lubridate,
    methods,
    rgbif (>= 3.1),
    taxize,
    stringr,
    knitr,
    stats,
    leaflet,
    htmltools,
    ggplot2,
    rlang,
    magrittr,
    tidyr,
    RPostgreSQL,
    DBI,
    waffle
VignetteBuilder: knitr
RoxygenNote: 7.1.1

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):
    occCite retrieves species occurrence data from several aggregator databases, processes them into a single data object, creates primary data provider citations based on the results.

  • Who is the target audience and what are scientific applications of this package?
    The target audience is primarily biogeographers and other researchers interested in the geographic distributions of biodiversity. occCite aims to close the gap in the citation cycle between primary data providers and final research products efficiently, allowing researchers to meet best-practice standards for biodiversity dataset documentation without sacrificing time and resources to the demands of providing increasing levels of detail on their datasets.

  • Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Tools in R do exist that enable researchers to document sources during the data collection
process—rgbif (Chamberlain et al. 2020a) for GBIF and BIEN (Maitner 2020) for the Botanical
Information and Ecology Network (BIEN) are both examples of API-interface packages that
include valuable features for citing data harvested from their aggregator databases. However, these
packages serve a single aggregator database are designed for specific use cases
tailored to their databases, and uniting aggregator results into a single dataset brings its own set of
challenges. Multi-platform occurrence aggregators do exist—searches using spocc (Chamberlain 2019)
can return occurrence information from up to 6 aggregator databases—but the process of combining data from these aggregators in each query results in the loss of key metadata, particularly accession date, primary data source, and, in the case of GBIF, dataset DOIs. Finally, to our knowledge, there does not yet exist a set of R tools that will manage metadata from an occurrence search and translate it into formatted citations for primary data providers that include accession dates and DOIs.

Yes

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you intend for this package to go on Bioconductor?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@melvidoni
Copy link
Contributor

Hello @hannahlowens, your Handling Editor will be @karthik. He will get in touch soon with the initial checks.

@karthik
Copy link
Member

karthik commented Nov 5, 2020

Editor checks:

  • Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package.
  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

👋 @hannahlowens. Here are some general checks from the goodpractice package. As you resolve these issues, you can run goodpractice::gp() yourself to make sure they are resolved. I'll start looking for reviewers now but if you have suggestions for people who can objectively review without a conflict of interest, I'm open to suggestions.

I also noticed there are no tests. Please add test coverage for all your functions. Let us know if you need help or pointers to get started.


It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 0% of code lines are covered by test cases.

    R/BIENtableCleanup.R:19:NA
    R/BIENtableCleanup.R:20:NA
    R/BIENtableCleanup.R:21:NA
    R/BIENtableCleanup.R:22:NA
    R/BIENtableCleanup.R:23:NA
    ... and 833 more lines

  ✖ use '<-' for assignment instead of '='. '<-' is the
    standard, and R users and developers are used it and it is easier
    to read your code for them if you use '<-'.

    R/occCitation.R:89:11
    R/occCitation.R:90:13
    R/occCitation.R:91:11
    R/occCitation.R:92:15

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    inst/extdata/sampleScript.R:79:1
    R/data.R:3:1
    R/data.R:4:1
    R/data.R:11:1
    R/data.R:13:1
    ... and 149 more lines

  ✖ omit trailing semicolons from code lines. They are not
    needed and most R coding standards forbid them

    inst/extdata/sampleScript.R:1:17
    inst/extdata/sampleScript.R:21:47
    inst/extdata/sampleScript.R:27:16
    inst/extdata/sampleScript.R:37:59
    inst/extdata/sampleScript.R:44:59
    ... and 17 more lines

  ✖ avoid calling setwd(), it changes the global environment.
    If you need it, consider using on.exit() to restore the working
    directory.

    R/getGBIFpoints.R:34:11
    R/getGBIFpoints.R:71:3
    R/getGBIFpoints.R:88:3
    R/occQuery.R:80:11
    R/occQuery.R:165:7
    ... and 1 more lines

  ✖ avoid sapply(), it is not type safe. It might return a
    vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/occCite_plotting.R:28:20
    R/occCite_plotting.R:171:28

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
    result 1:0 if the expression on the right hand side is zero. Use
    seq_len() or seq_along() instead.

    R/gbifRetriever.R:36:13
    R/occCitation.R:125:17
    R/occCitation.R:131:17
    R/occCitation.R:137:17
    R/occCitation.R:144:17
    ... and 9 more lines

  ✖ not import packages as a whole, as this can cause name
    clashes between the imported packages. Instead, import only the
    specific functions you need.
  ✖ fix this R CMD check ERROR: Package suggested but not
    available: 'RefManageR' The suggested packages are required for a
    complete check. Checking can be attempted without them by setting
    the environment variable _R_CHECK_FORCE_SUGGESTS_ to a false value.
    See section 'The DESCRIPTION file' in the 'Writing R Extensions'
    manual.

Reviewers:
Due date:

@hannahlowens
Copy link
Author

Hi Karthik!
Thanks for the feedback. Before I start working through the goodpractice flags, I have what might be a naive question. What do you mean the package doesn't have any tests? Every public function has an example. If the test can't be completed in less than 5s or need user input (like user login information for GBIF), per CRAN guidelines, I flagged it with donttest. Perhaps on a related note, I've been having problems with GitHub Actions ignoring my _R_CHECK_DONTTEST_EXAMPLES_ : false argument.

@karthik
Copy link
Member

karthik commented Nov 6, 2020

Hello Hannah. By tests, I mean explicit tests for each functions in the tests folder. Examples are great and can be turned into tests. What you have are more along the lines of informal tests. For an introduction to writing tests, this chapter from the R packages book https://r-pkgs.org/tests.html might be useful.

@karthik
Copy link
Member

karthik commented Nov 21, 2020

👋 @poldham! Would you be able to serve as a reviewer for this submission?

@karthik
Copy link
Member

karthik commented Nov 21, 2020

👋 @peterdesmet Would you be able to as a second reviewer on this submission?

@peterdesmet
Copy link
Member

Hi @karthik, my colleague @damianooldoni and I are happy share second reviewer duties if that is ok with you. Should we wait until tests are implemented in the pkg?

@hannahlowens
Copy link
Author

Hi @karthik et al.!

Tests are now implemented in the package.

@karthik
Copy link
Member

karthik commented Nov 30, 2020

Hi @peterdesmet. Thank you! I'm happy to have the two of you as reviewers (no need to share duties). Is that ok with you @damianooldoni?

and thanks @hannahlowens! Perfect timing.

@KevCaz
Copy link
Member

KevCaz commented Mar 3, 2021

I saw @melvidoni's tweet today, I am an ecologist who works on occurrence data, and I am relatively comfortable with R packages, so I would be happy to review the package if you think I can be a suitable reviewer!

@Rekyt
Copy link

Rekyt commented Mar 4, 2021

Similar to @KevCaz

I saw @melvidoni's tweet today, I am an ecologist who works on occurrence data, and I am relatively comfortable with R packages, so I would be happy to review the package if you think I can be a suitable reviewer!

I also saw @melvidoni's tweet and I'll be happy to review the package as I'm also a ecologist working with large biodiversity databases.

@damianooldoni
Copy link

yes, @karthik , it's ok with me. Are there some deadlines for this task? I work bad without them 😄

@karthik
Copy link
Member

karthik commented Mar 4, 2021

Excellent @damianooldoni!
How about by March 19th?

@karthik
Copy link
Member

karthik commented Mar 4, 2021

Hi @peterdesmet. @damianooldoni
Please follow the guidelines here: https://devguide.ropensci.org/reviewerguide.html and let me know if you run into any issues.

@damianooldoni
Copy link

Sorry @karthik for being late. I start now and I will finish it before the end of this week.

@karthik
Copy link
Member

karthik commented Apr 14, 2021

Thank you very much!

@damianooldoni
Copy link

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.

I have never had working relationship with package authors.

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples (that run successfully locally) for all exported functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Does not apply Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 10

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Well done! The package name fits the scope very well, it's catchy and easy to remember. Here below some comments to improve this package.

GitHub repo webpage

In https://github.com/hannahlowens/occCite I read in the Aout section:

Querying database aggregators and citing primary sources of resulting occurrence points

Why do you speak about points? You mean occurrences in general which are not points, although I understand that we conceive them as points in the spacetime. Just replace occurrence points with occurrences.

README

Some issues related to rOpenSci guidelines:

  • README.md doesn't include a section about installing the package. Actually such section it's quite important as providing instructions about installation is one of the goals of any package's README. I suggest to add such info's in the existing Setup section.
  • rOpensci guidelines states that README should containg a "brief demonstration usage". The README.md is way too long and duplicates information contained in the vignette. The README should point to other documentation sources via links to vignettes.
  • Missing badges: test coverage, the badge for rOpenSci peer-review, a repostatus.org badge
  • Missing citation information
  • No code of conduct and contribution guidelines

Minor remarks:

  • link Global Names List in README doesn't work.
  • rgbif function occ_search has a hard limit of 100000 records, not 200000 as stated in README.md. At least this is what it's written in documentation (?occ_search) and how it works (rgbif version: 3.5.2):
rgbif::occ_search(taxonKey = 1, limit = 200000)
Error: Max offset of 100001 exceeded: 99900 + 300

Vignettes

  • Vignette is very long. I suggest to split it in two. The advanced features can be easily put in a second vignette called Advanced
  • Vignette title is too long. Actual title: An overview of executing searches and generating citations using occCite. Suggested title: Execute occurrence searches and generate citations
  • Code chunks in https://hannahlowens.github.io/occCite/articles/occCite_Vignette.html#loading-data-from-previous-gbif-searches return errors on website. To be fixed
  • While running all chunks of Rmd vignette, I get 7 times the following warning:
Warning messages:
1: In utils::citation(pkg) :
  no date field in DESCRIPTION file of packageoccCite2: In utils::citation(pkg) :
  no date field in DESCRIPTION file of packageoccCite3: In utils::citation(pkg) :
  no date field in DESCRIPTION file of packageoccCite4: In utils::citation(pkg) :
  no date field in DESCRIPTION file of packageoccCite5: In utils::citation(pkg) :
  no date field in DESCRIPTION file of packageoccCite6: In utils::citation(pkg) :
  no date field in DESCRIPTION file of packageoccCite7: In utils::citation(pkg) :
  no date field in DESCRIPTION file of packageoccCite

I think you should get this fixed: occCite is about citation after all. To get an idea, if I try the same commando with rgbif package the following is returned:

> utils::citation("rgbif")

Chamberlain S, Barve V, Mcglinn D, Oldoni D, Desmet P, Geffert L, Ram K (2021). _rgbif: Interface
to the Global Biodiversity Information Facility API_. R package version 3.5.2, <URL:
https://CRAN.R-project.org/package=rgbif>.

Chamberlain S, Boettiger C (2017).R Python, and Ruby clients for GBIF species occurrence data._PeerJ PrePrints_. <URL: https://doi.org/10.7287/peerj.preprints.3304v1>.

To see these entries in BibTeX format, use 'print(<citation>, bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.

This remark is related to the missing citation information in README mentioned above.

Code style

Personally I like tidyverse code style. For example, about comments style, I would add a space between the dash # and the comment as described in https://style.tidyverse.org/syntax.html#comments. And I would avoid to end up comments with a dot as I found in L23 of occCitation.R

To automatize the restyling usethis::use_tidy_style() can help a lot.

Please consider also to split strings longer than 80 characters. For example, L25 of occCitation():

warning("Input x is not of class 'occCiteData'. Input x must be result of a studyTaxonList() search.\n")

Functions

GBIFLoginManager()

Basic but useful function! Nice. My only concern here is that this function silently(!) triggers a GBIF download of 109 occurrences every time it's called by a user, just to test if the credentials are correct. There are two problems doing this, both from user and GBIF side. As user I don't like to have my download page full of dummy downloads silently created every time I use this function. The triggering of downloads is also not mentioned in function documentation: I wonder what GBIF users will start to think when they notice such downloads: will they contact GBIF about it thinking about an error at GBIF side? 😱
From GBIF side maybe it's not a big problem to have so many dummy downloads, but still, I would like to know whether you asked GBIF about it. Maybe do they have an idea about using a dummy login for tests purposes?

In any case, I strongly suggest to remove the test. A strong reason to remove it is that the user still gets an error by rgbif::occ_download() if credentials are not valid :+1

> occ_download(user="user",
               email = "email",
               pwd = "pwd",
               rgbif::pred("catalogNumber", 217880))
Error: 401 - Unauthorized

I understand very well you want to apply defensive programming in your package! And that's great. However, as GBIFLoginManager() is not adding much more than grouping user, email and pwd within an instance of class GBIFLogin, I don't find adding another level of defense really useful, espcially taking into account the drawbacks described above.

A typo: in function comment text at L66, replace #Populating an instance of class occCiteData with #Populating an instance of class GBIFLogin

getGBIFpoints()

Same remark as for gitHub webpage above. I would replace points with occs, everywhere. This package is all about occurrences and contains the abbreviation occ in the title after all! :smile

I would also mention GBIF in the @description. A suggestion:

#' @description Downloads GBIF occurrences and useful related information for
#' processing within other occCite functions

I see that getGBIFpoints() calls rgbif function occ_download() harvesting only occurrences with coordinates and without any geospatial issue by using rgbif predicates (L64-65):

rgbif::pred("hasCoordinate", TRUE)
rgbif::pred("hasGeospatialIssue", FALSE)

I suggest also to add a predicate to retrieve only presences. Not doing it can be very dangerous and a quite typical and naive error while processing GBIF data as the typical user is not concerned about occurrences related to absences.

A more general remark/question: what if users want to add other filters or modify yours? If I am interested only in data from 2018 from Andorra, why should I get data from all the world from 1000 up to now?

This is the classic dilemma while programming high level functions: the balance between being user friendly and flexibility.

Personally I would allow users to make their own query by means of ellipsis .... In this way your function is powerful but at the same time user friendly: GBIF/rgbif experts can specify their own predicates while other users just benefit of the basic functionality and so everybody is happy 😄

In any case, even if you decide to not change your code, still you need to document your choice! The best way is to do it in Details section (@details) of the function documentation.

Another remark streaming from the previous one: I see you limit your research on species as you search among species names only. At L41:

key <- rgbif::name_suggest(q=cleanTaxon, rank='species')$data$key[1]

I find this quite a limit: why don't you allow the user to specify the rank itself by adding an argument rank to the function? In this way you make your function much more useful. Again, to solve the dilemma I cited above, you can add a default value ("species") to this new argument. This is also important as it can be applied for a multi-species search! You can just pass a vector of species to argument taxon or simply the parent name, e.g. Gadus to get all occurrences of Gadus spp.

I would also suggest to add (at least) a kingdom argument to avoid hemihomony. An alternative would be to allow the user to select which taxon he/she means if GBIF returns multiple taxa instead of choosing the first key (see chunk code above). In case of

Based on this, I would also change description of argument taxon (A single species) to a valid taxon name and some examples more:

# taxon can be a species (default)
getGBIFpoints(taxon="Gadus morhua",
              GBIFLogin = myGBIFLogin,
              GBIFDownloadDirectory = NULL)
# taxon can be something else, e.g. a genus
getGBIFpoints(taxon="Gadus",
              rank = "genus"
              GBIFLogin = myGBIFLogin,
              GBIFDownloadDirectory = NULL)

Still speaking about documentation, I see you can pass a phylogeny object to taxon, which is great, but this should be documented in @param taxon field in Roxygen documentation as well.

Your package is actually quite taxonomic oriented: you MUST specify a taxon/phylogeny. But what if the user wants to download all data from a country or a polygon for example? This is something I do quite often. Again, take in mind the balance between flexibility and user friendiness.

A typo: comment text at L34: replace #File hygene with #File hygiene.

prevGBIFdownload()

This function is quite important to avoid downloading again and again same data:

prevGBIFdownload(taxonKey = "No key", GBIFLogin)

Why do you have to specify a default value to taxonKey? I don't see any need in your code to do so. And in case you really want to specify a default value, I suggest to use NULL instead of a dummy value like "No key " and adapt the code in case needed. This is a kind of convention accepted by the R community.

Let's say I had to trigger a download via rgbif instead of using your getGBIFpoints() function as I need to download all data of all species in a region instead of specifying a taxon/phylogeny. Well, it would be nice to allow me to still use the rest of the package functionalities, isn't? This can be another way to solve the dilemma between user friendly code and flexibility. How to do it? you can allow me to get my GBIF download by passing a GBIF download key or the GBIF DOI to getGBIFpoints() and so to prevGBIFdownload()!

`prevGBIFdownload(taxonKey, GBIFLogin, GBIFDownloadKey)`

Not only, as the downloads are public, the use of GBIFLogin is not really needed if the GBIF download is provided! For example, you can download this download, with key 0252206-200613084148143 and DOI https://doi.org/10.15468/dl.bptja4 isn't?

Again, this improves the function: up to now the user is limited to the most recent download of the same taxon/philogeny. But waht if the last download has been deleted or it is corrupted or is invalid for any other reason?

getBIENpoints()

Quite the same remarks as for getGBIFpoints()

occCitation()

I think you forgot to specify which function(s) you need to import from RPostgreSQL package. At L13 I see:

#' @import RPostgreSQL

tabulate.occResults()

You wrote @keywords Internal instead of @keywords internal: so this function is in the documentation index. It's also good practice in rOpensci to add #' @noRd for internal functions. Same holds true for tidyverse style guide: see https://style.tidyverse.org/documentation.html#internal-functions. Now I can see the documentation of this function online, which is strange for internal functions.

Why do you use dplyr::mutate_if dplyr::mutate and dplyr::bind_rows instead of adding them to the @importFrom dplyr as you do for %>% and filter? Not only, filter is imported but never used in tabulate.occResults()

occCiteMap

Again, is there any conflict with other packages for using dplyr::filter instead of filter as you import it?

All leaflet functions are not in @importFrom

All other functions

I think a in-depth documentation check should be done throughout all other functions.

Tests

The idea of unit-tests is to tests as much units of code as possible, and doing it fast. But tests should not rely or at least as less as possible on other packages or databases.

Is there a reason to use catalogNumber 217880 as predicate for triggering a GBIF download?

I would just pass an existent GBIF download key where it's possible so that your code is more independent of other packages.

Running devtools::test() returns 10 skipped tests: Reason: GBIF login unsuccessful. This should not happen on my laptop as I have the environmental variables all set up! The problem is that sometimes you use GBIFLogin object without first creating it. Example: L20 of test-GBIFTableCleanup.R.

So, be careful while using try() as you don't know if the code stops due to a bug or due to the reason you have in mind. And in general, IMHO I wouldn't use try() in tests: tests are not there to be tried, but to succeed or to fail.

@hannahlowens
Copy link
Author

@karthik Should I expect another set of reviews, or should I go ahead and address these comments? If the latter, how is this generally accomplished?

@ldecicco-USGS
Copy link

Hi @hannahlowens , sorry for the super long delay. I'm currently acting as Editor in Chief, and was just directed to this thread.

I would say if you are still interested in completing this process to go ahead and respond to the single review. I'll take some time and try to familiarize myself with the package and act as a second reviewer.

@hannahlowens
Copy link
Author

hannahlowens commented Dec 14, 2021 via email

@hannahlowens
Copy link
Author

hannahlowens commented Mar 4, 2022

Review Response

Hi @ldecicco-USGS,

After a long time (sincere apologies there), I have worked through Damiano's wonderfully thorough and helpful review, and I have added him as a reviewer in the package citation. Below, I will go through his comments, point-by-point.

Regarding "points"

At several places in the review, the reviewer suggests replacing "point(s)" with "occurrence(s)", since it's a bit more intuitive and specific. I have done this throughout the documentation, although argument and function names remain the same. There are already downstream dependencies that use the existing names. I will work with the developers of those packages to transition from "points" to "occurrences" as a natural part of our ongoing development collaboration.

README

  • README.md has been reworked to include sections on installing and citing the package
  • The demonstration of usage has been replaced with links to "Simple" and "Advanced" vignettes
  • Requested badges have been added, as has a Zenodo DOI badge
  • Code of conduct and contribution guidelines have been added and are linked in the README
  • The Global Names List link has been fixed
  • Good catch on the occ_search() hard limit. I have fixed information throughout where appropriate.

Vignettes

  • Vignettes are now split into "Simple" and "Advanced" workflows.
  • Titles have also been shortened: "Simple search and citation of occurrences" and "Advanced search and citation of occurrences".
  • Code chunks all work without error now.
  • The "no date" warning has been fixed. As a citation package, that one was for sure embarrassing.

Code Style

  • Reformatted using use_tidy_style() and broke up lines longer than 80 characters when it did not strongly and adversely affect readability or functionality.

Functions

GBIFLoginManager()

  • Silent download trigger removed--excellent point. Instead, function pings GBIF API for a list of previous downloads. This is done to 1) check connectivity and fail gracefully if the server is unresponsive and 2) check the user's login credentials are correct.
  • Typo fixed

getGBIFpoints()

  • GBIF now mentioned in the description
  • Now only retrieves presences (Lines 98-100). This was another excellent catch. I had not thought absences would be included, but of course GBIF doesn't just serve museum data (my background), they also serve all kinds of other biodiversity data. Thank you! I've added handling custom predicates as an issue to be addressed in future development. There are some complexities there I need to think carefully about. I've added this as an issue on the gitHub. At any rate, the automatic predicates are now stipulated in @details.
  • Handling for higher taxonomies and hemihomonies has also been added as an issue on gitHub. The potential for multiple taxonomic keys of which the first is not correct is potentially a problem.
  • This function was written primarily internally for use by occQuery(), which passes the function an already validated species name (or a name that has been extracted from a phylogeny and then validated), hence the documentation for @value x being what it is.
  • The package was designed for use particularly in species distribution modeling/ecological niche modeling workflows, hence the taxonomic focus. I recognize that it could potentially be useful in a more geographic context, but we first want to really nail the species-level taxonomic-focused functionality to establish the package first. -Typo at Line 34 fixed.

prevGBIFdownload()

  • There is no reason to specify a default value of "No key" for taxonKey. This was an inelegant early solution that was never removed. It's now NULL.
  • I really like the suggestion of adding the ability to pass prevGBIFdownload() a specific download key so that the user can bring searches started elsewhere into the occCite workflow. I have added this as an issue, as I think it will take some careful thought to integrate fully and intuitively.

occCitation()

  • RPostgreSQL is imported whole since the list of functions being imported was running rather long.

tabulate.occResults()

  • Fixed so @keywords internal and added @noRd
  • Functions listed now imported from dplyr via @importFrom
  • filter removed from @importFrom dplyr

occCiteMap()

  • leaftlet is imported whole since the list of functions being imported was running rather long.

All other functions

Documentation check done. Edited for clarity and completeness.

Tests

  • The catalogNumber search was a clunky solution to testing GBIF API connectivity, which I have now replaced with an occ_download_list()query. These tests are necessary to confirm connectivity with the GBIF API when running the tests, and to flag errors that might arise if there are changes in how rgbif is structured. Currently, as occCite's database scope is limited to GBIF and BIEN, the lack of functionality in either package bringing down occCite is desirable. In the future when the scope of the package is expanded, making is less dependent on rgbif will be a higher priority.
  • I have fixed the skipped tests--GBIFLogin() should have been run to collect login information.
  • The try() elements are included to make the package tests robust to situations where there is no ability to connect to the APIs. This is a CRAN requirement.

Conclusion

Thank you again. I firmly believe this review was incredibly helpful and made occCite more robust, useful, and clear. Please let me know how to proceed. Is a second review still coming?

@ldecicco-USGS
Copy link

Looks great! @damianooldoni let me know if you think the response to the review is satisfactory.

@damianooldoni
Copy link

Thanks @hannahlowens for appreciating my comments. I am happy you find them useful.
@ldecicco-USGS: yes, the response to the review is very satisfactory. Good job! 👌

@karthik
Copy link
Member

karthik commented Mar 11, 2022

Thanks very much, @hannahlowens and @damianooldoni! I'm back on this review. I'll approve it shortly.

@karthik
Copy link
Member

karthik commented Mar 11, 2022

@ropensci-review-bot approve occCite

@ropensci-review-bot
Copy link
Collaborator

Approved! Thanks @hannahlowens for submitting and @peterdesmet for your reviews! 😁

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so.
  • After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the new repository URL.
  • Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
  • We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
  • You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. She will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

@maelle
Copy link
Member

maelle commented Mar 15, 2022

👋 @hannahlowens Is any help needed with the repo transfer?
I'm asking as the package will be listed as new package in Friday's newsletter if it's transferred by then. 😸

@hannahlowens
Copy link
Author

Hi @maelle Thanks for checking in. I've been a little swamped, but the repo transfer is on my to-do list today. Listing it in Friday's news paper is a much-needed shove! I'll be in touch if I run into any snags.

@hannahlowens
Copy link
Author

hannahlowens commented Mar 16, 2022

@ropensci-review-bot finalize transfer of ropensci/occCite

@ropensci-review-bot
Copy link
Collaborator

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@ropensci-review-bot help

@maelle
Copy link
Member

maelle commented Mar 17, 2022

@hannahlowens can you please try again with @ropensci-review-bot finalize transfer of occCite? Thank you!

@hannahlowens
Copy link
Author

@ropensci-review-bot finalize transfer of occCite

@ropensci-review-bot
Copy link
Collaborator

Transfer completed. The occCite team is now owner of the repository

@hannahlowens
Copy link
Author

@maelle Thanks! Is it ok to leave my pkgdown workflow in place, or is it preferred that I switch to ROpenSci building and branding? If I can leave the website as is, I think I've successfully done everything else and will do a fresh release.

@maelle
Copy link
Member

maelle commented Mar 17, 2022

@hannahlowens The rOpenSci docs infrastructure will build a pkgdown website in any case, so I'd say it's best to delete your workflow (unless you want the two sites, which a few packages have to e.g. have their institutional branding).

@hannahlowens
Copy link
Author

@maelle I see, thanks! I've deleted my workflow.

@hannahlowens
Copy link
Author

@maelle Sorry, one more question--I want to make sure everything is as clean as possible before tomorrow. I deleted my pkgdown workflow, but I see the pages-build-deployment workflow builds fine but fails to deploy. Why is this? Is this something I can fix?

@maelle
Copy link
Member

maelle commented Mar 17, 2022

@hannahlowens it was because of the lower-case readme.md, now it's deployed: https://docs.ropensci.org/occCite/ 🎉

@karthik
Copy link
Member

karthik commented Mar 17, 2022

Thanks @hannahlowens and @maelle!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests