phylotaR -- submission #187

DomBennett · 2018-01-23T13:07:53Z

Summary

What does this package do? (explain in 50 words or less):

R implementation of the PhyLoTa pipeline. It identifies orthologous sequences in GenBank using BLAST searches. It consists of three automated stages: taxise, download and cluster.

Paste the full DESCRIPTION file inside a code block below:

Package: phylotaR
Type: Package
Title: Automated phylogenetic sequence cluster identification from GenBank
Version: 0.1
Date: 2017-11-17
Author: Hannes Hettling, Rutger Vos, Alexander Zika, Dominic J. Bennett, Alexandre Antonelli
Maintainer: D.J. Bennett <dominic.john.bennett@gmail.com>
Description: PhyLoTa identifies orthologous sequence clusters from data readily available in GenBank. This is an R implementation of this process.
License: GPL-2
URL: https://github.com/dombennett/phylotaR#readme
BugReports: https://github.com/dombennett/phylotaR/issues
SystemRequirements: BLAST+ (>2.7.1)
Depends:
    R (>= 3.3.0),
    methods,
    foreach
Imports:
    curl,
    doMC,
    doSNOW,
    igraph,
    CHNOSZ,
    rentrez,
    DBI,
    taxize,
    XML
Suggests:
    testthat,
    knitr,
    rmarkdown
RoxygenNote: 6.0.1
VignetteBuilder: knitr

URL for the package (the development repository, not a stylized html page): https://github.com/DomBennett/phylotaR
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

data retrieval, it downloads sequences from genbank and identifies sequence clusters that may be of use for phylogenetic analysis

Who is the target audience and what are scientific applications of this package?

Evolutionary biologists, phylogeneticists. Determining sequence orthology is the essential first step of any phylogenetic analysis. Usually a phylogeneticist may search GenBank for sequences of the same gene, download and align. Due to naming problems, however, potential relevant sequences may be missed or sequences that have been mislabelled are downloaded. PhyLoTa gets around this by performing all-v-all BLAST searches and does not rely on gene names. PhyLoTa is often the first step of any large-scale automated phylogenetic pipeline (e.g. SUPERSMART). Potentially, this R package could be central to many future phylogenetic analyses.

Are there other R packages that accomplish the same thing? If so, how does
yours differ or meet our criteria for best-in-category?

The only current approach for identifying PhyLoTa clusters is the PhyLoTa browser. The data underlying its current release, however, is five years old. In those five years there have been over 40 million new sequences added to GenBank. An alternative is needed to make use of the latest data. Additionally, phylotaR would allow a user to specify their own parameters when running the pipeline.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
has a CRAN and OSI accepted license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

Do you intend for this package to go on CRAN?
Do you wish to automatically submit to the Journal of Open Source Software? If so:
- The package has an obvious research application according to JOSS's definition.
- The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
- The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
- The package is novel and will be of interest to the broad readership of the journal.
- The manuscript describing the package is no longer than 3000 words.
- You intend to archive the code for the package in a long-term repository which meets the requirements of the journal.
- (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

No errors, but plenty of warnings to do with documentation.

Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

Not just yet. Outstanding: conversion to snake_case, generate README from rmd, create website with pkgdown, NEWS required, authors@R syntax, depends to imports and cat to message (although is that necessary given the logging methods?)

If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Pre-submission Note

Hi!
I've produced this pre-submission issue to determine whether phylotaR would be something you'd like added to your roster? On our side, we would be very keen as we think future phylogenetic analyses will depend on making software open-source, modular and collaborative. I did feel, however, that it may not quite fit into the ropensci family, because it's a pipeline and it depends on software external to R (BLAST). Please be aware that phylotaR is still only being developed and a lot of testing still needs to be performed. If you are interested, I'd be happy to update the naming style (I prefer camelcase), improve the documentation and whatever else is needed to meet your criteria.
Thanks,
Dom

The text was updated successfully, but these errors were encountered:

sckott · 2018-01-30T17:42:12Z

thanks for your submission @DomBennett we've been discussing and will get back to you asap

sckott · 2018-01-30T18:29:24Z

Editor checks:

Fit: The package meets criteria for fit and overlap
Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
License: The package has a CRAN or OSI accepted license
Repository: The repository link resolves correctly
Archive (JOSS only, may be post-review): The repository DOI resolves correctly
Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @DomBennett !!

You didn't check the top level MEE box - i've checked that box. You didn't check the box for The manuscript describing the package is no longer than 3000 words - will manuscript length be a problem?
You didn't check the box for has a CRAN and OSI accepted license. I think you do, can you check that box?
You didn't check the box for does not violate the Terms of Service of any service it interacts with. Is that going to be a problem? Can you check on that?
I see you checked the submit to MEE box. Note that we can not make any gaurantee about MEE considering your paper a good fit for them, or them accepting the paper
Seems like blast on Homebrew (installed via brew install blast) is the same as at the NCBI link you gave at https://www.ncbi.nlm.nih.gov/books/NBK279671/ - Is that right? homebrew is for OSX platforms only
It would indeed be ideal if the pkg didn't have to use system calls to external tools, but there are times when there's no other way.
In terms of style, we prefer snake case but as long as you are consistent within the package that is THE most important thing (i.e, don't use a combination of camelCase, TitleCase and snake_case).
You did not check the box for Does the package conform to rOpenSci packaging guidelines? Can you do that and describe any exceptions
You can put a ropensci review badge in your README

[![](https://badges.ropensci.org/187_status.svg)](https://github.com/ropensci/onboarding/issues/187)

Goodpractice output for you and reviewers to consider:

── GP phylotaR ────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 75% of code lines are covered by test cases.

    R/base-wrappers.R:7:NA
    R/blast-tools.R:10:NA
    R/blast-tools.R:16:NA
    R/blast-tools.R:28:NA
    R/blast-tools.R:34:NA
    ... and 325 more lines

  ✖ not use "Depends" in DESCRIPTION, as it can cause name
    clashes, and poor interaction with other packages. Use "Imports"
    instead.
  ✖ omit "Date" in DESCRIPTION. It is not required and it
    gets invalid quite often. A build date will be added to the package
    when you perform `R CMD build` on it.
  ✖ use '<-' for assignment instead of '='. '<-' is the
    standard, and R users and developers are used it and it is easier
    to read your code for them if you use '<-'.

    R/ncbi-tools.R:156:9

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/blast-tools.R:59:1
    R/taxise-tools.R:361:1

  ✖ omit trailing semicolons from code lines. They are not
    needed and most R coding standards forbid them

    R/ncbi-tools.R:147:55

  ✖ avoid sapply(), it is not type safe. It might return a
    vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/blast-tools.R:32:13
    R/blast-tools.R:33:11
    R/cache-tools.R:253:13
    R/cache-tools.R:256:13
    R/cache-tools.R:317:12
    ... and 32 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
    result 1:0 if the expression on the right hand side is zero. Use
    seq_len() or seq_along() instead.

    R/cluster-tools.R:345:12
    R/ncbi-tools.R:162:20
    R/pipeline.R:208:12
    tests/testthat/test-cluster-tools.R:123:17
    tests/testthat/test-download-tools.R:81:17
    ... and 2 more lines

  ✖ fix this R CMD check WARNING: '::' or ':::' imports not
    declared from: ‘ggplot2’ ‘plyr’ Namespaces in Imports field not
    imported from: ‘DBI’ ‘taxize’ All declared Imports should be used.
    Packages in Depends field not imported from: ‘foreach’ ‘methods’
    These packages need to be imported from (in the NAMESPACE file) for
    when this namespace is loaded but not attached. Missing or
    unexported object: ‘doSNOW::registerDoMC’
  ✖ fix this R CMD check NOTE: blstN: no visible binding for
    global variable ‘cmd’ blstN: no visible global function definition
    for ‘read.table’ byLineage : .calc : <anonymous>: no visible
    binding for global variable ‘value’ fltr: no visible global
    function definition for ‘head’ fltr: no visible global function
    definition for ‘tail’ genClstrsObj: no visible global function
    definition for ‘new’ getBestClstrs: no visible global function
    definition for ‘nTaxa’ getBestClstrs: no visible global function
    definition for ‘seqLength’ getClMdLn : <anonymous>: no visible
    global function definition for ‘median’ getMngblIds: no visible
    global function definition for ‘head’ getMngblIds: no visible
    global function definition for ‘tail’ mkBlstDB: no visible binding
    for global variable ‘cmd’ nDscndnts: no visible global function
    definition for ‘%dopar%’ nDscndnts: no visible global function
    definition for ‘foreach’ nDscndnts: no visible binding for global
    variable ‘i’ plotTable: no visible binding for global variable
    ‘clstrnm’ plotTable: no visible binding for global variable
    ‘txidnm’ plotTable: no visible binding for global variable ‘value’
    runStgs: no visible global function definition for ‘is’ safeSrch:
    no visible global function definition for ‘is’ setUp: no visible
    global function definition for ‘packageVersion’ writeClstr: no
    visible global function definition for ‘write.table’ writeSqs: no
    visible binding for global variable ‘clstrs_obj’ writeTax: no
    visible global function definition for ‘write.table’
    summary,ClstrsObj: no visible binding for global variable ‘x’
    Undefined global functions or variables: %dopar% clstrnm clstrs_obj
    cmd foreach head i is median nTaxa new packageVersion read.table
    seqLength tail txidnm value write.table x Consider adding
    importFrom("methods", "is", "new") importFrom("stats", "median")
    importFrom("utils", "head", "packageVersion", "read.table", "tail",
    "write.table") to your NAMESPACE file (and ensure that your
    DESCRIPTION Imports field contains 'methods').
  ✖ fix this R CMD check WARNING: Undocumented data sets:
    ‘blst_rs’ ‘phylt_nds’ ‘sqs’ ‘tdobj’ All user-level objects in a
    package should have documentation entries. See chapter ‘Writing R
    documentation files’ in the ‘Writing R Extensions’ manual.
  ✖ fix this R CMD check NOTE: Warning: no function found
    corresponding to methods exports from ‘phylotaR’ for: ‘show’ A
    namespace must be able to be loaded with just the base namespace
    loaded: otherwise if the namespace gets loaded by a saved object,
    the session will be unable to start. Probably some imports need to
    be declared in the NAMESPACE file.

You don't have to address these now.

Seeking reviewers now 🕐

Reviewers: @arendsee @naupaka
Due date: 2018-03-12

rvosa · 2018-01-31T09:58:28Z

@sckott - what would you recommend for a journal to publish a short application note about this package?

sckott · 2018-01-31T19:33:08Z

@rvosa possibly: we have a sort of pairing with JOSS and MEE. Also maybe: The R Journal, F1000Research

rvosa · 2018-01-31T20:50:05Z

Thanks! Sorry, what's JOSS?

…

On Wed, Jan 31, 2018 at 8:33 PM, Scott Chamberlain ***@***.*** > wrote: @rvosa <https://github.com/rvosa> possibly: we have a sort of pairing with JOSS and MEE. Also maybe: The R Journal, F1000Research — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#187 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAGf-rOdGkuculjA9u_JE-FBSAS07aMuks5tQL_1gaJpZM4Rpm8k> .

sckott · 2018-01-31T21:09:59Z

http://joss.theoj.org/

sckott · 2018-02-06T20:06:06Z

@DomBennett I had a number of questions for you above in my comments. Please respond

DomBennett · 2018-02-07T13:00:35Z

@sckott Sorry, I thought I could wait for the reviewers and respond to everything wholesale.

You didn't check the top level MEE box - i've checked that box. You didn't check the box for The manuscript describing the package is no longer than 3000 words - will manuscript length be a problem?

We hadn't decided at that point which journal to send it to. We're not likely to submit to MEE now. All MEE boxes have been unticked.

You didn't check the box for has a CRAN and OSI accepted license. I think you do, can you check that box?

It does now. I added an MIT license.

You didn't check the box for does not violate the Terms of Service of any service it interacts with. Is that going to be a problem? Can you check on that?

I've read the terms now and it won't. Box ticked.

I see you checked the submit to MEE box. Note that we can not make any gaurantee about MEE considering your paper a good fit for them, or them accepting the paper.

It'd be nice if you did guarantee! But we're not going with MEE....

Seems like blast on Homebrew (installed via brew install blast) is the same as at the NCBI link you gave at https://www.ncbi.nlm.nih.gov/books/NBK279671/ - Is that right? homebrew is for OSX platforms only

It's just the link to the general description of how to install BLAST tools across all systems. Not specific to OSX.

It would indeed be ideal if the pkg didn't have to use system calls to external tools, but there are times when there's no other way.

Yeah it sucks big time. We're currently working on an approach to avoid using standalone BLAST, in which case a user would not need to install it. The BLAST tools are written in C++, so in theory it should be possible to package the code into the R package. I'm not too au fait on how to do this though. I've written external C scripts into R packages before, but never have I taken someone else's C code entirely. There may be problems with modifying the C++ code to interact with the R. Do you have any ideas on workarounds? Otherwise we could simply give the user the option to send their BLAST jobs to online servers (e.g. NCBI or EBI).

In terms of style, we prefer snake case but as long as you are consistent within the package that is THE most important thing (i.e, don't use a combination of camelCase, TitleCase and snake_case).

Yeah I know..... sigh..... Coding style can be real bugbears. I try and use a sort of PEP8 style with a mix of all three (variables snake_case, functions camelCase and classes TitleCase). I can go through and make it all snake_case. I'll probably rechristen some elements where I think the names don't quite work.

You did not check the box for Does the package conform to rOpenSci packaging guidelines? Can you do that and describe any exceptions

Not just yet. See above. Also I need to go through your more specific recommendations above as well. Do I need to make these changes before the reviewers look over the package? It's just the pipeline is still being tested and developed and it may be a while before these changes can be made.

You can put a ropensci review badge in your README

Dun'd.

sckott · 2018-02-07T18:08:36Z

We hadn't decided at that point which journal to send it to. We're not likely to submit to MEE now. All MEE boxes have been unticked.

Okay, sounds good.

It does now. I added an MIT license.

thanks.

I've read the terms now and it won't. Box ticked.

thanks.

We're currently working on an approach to avoid using standalone BLAST ...

Glad you're looking into some options. It's possible reviewers will have some ideas as well. One idea: if you do shell out, you could use the sys package made by Jeroen on our team https://cran.rstudio.com/web/packages/sys/

I need to go through your more specific recommendations above as well. Do I need to make these changes before the reviewers look over the package?

If you're talking about the goodpractice output, no, not now. That's just information for you and reviewers to consider.

arendsee · 2018-02-12T23:46:09Z

Hi @DomBennett, I'm working on reviewing phylotaR, but ran into a few little issues.

When I run the vignette, I get the following error:

> clstr_id <- getBestClstrs(clstrs_obj, n=1)                        
Error in nTaxa(clstrs_obj) : could not find function "nTaxa"

Fix the BLAST version handling

Currently the BLAST version requirement is hard coded to greater than 2.7.1
(setup-tools.R:118). I have 4 issues with this:

Hard-coding of values is generally bad.
BLAST version 2.7.1 is pretty restrictive. For example, the Ubuntu package
manager provides BLAST version 2.2.28. Many researchers work in paranoid
agencies that don't give us admin privileges, so installing the latest and
greatest is often a pain. Anyway, unless your algorithm depends on new
features, you should relax the version requirement.
The code for testing version is brittle. The version vector is parsed into a 3-tuple (e.g. 2.7.1 ==> c(2,7,1)), then tested as below

  # setup-tools.R:118
  tst <- vrsn[1] >= 2 & vrsn[2] >= 7 & vrsn[3] >= 1

However, this would lead to versions such as 2.8.0 being rejected. You can
fix this by dropping the last comparison:

  # setup-tools.R:118
  tst <- vrsn[1] >= 2 & vrsn[2] >= 7

Also, in DESCRIPTION, the SystemRequirements specifies a BLAST version of >2.7.1, but your actual version checking function requires >=2.7.1.

Fix the warnings and notes in check()

When I run devtools::check() I get 5 warnings, and 3 notes. The issues all look pretty normal (easy to fix). I don't see anything too serious here. Below I have outlined the main issues and how to fix them (I might have missed one or two).

Remove the warnings_are_errors: false clause from your .travis.yml
file. CRAN will generally not accept a package if any warnings are raised by
CHECK. So you probably should not ignore warnings in your continuous
integration.
DESCRIPTION says that your license is GPL-2, but the actual LICENSE file is
MIT. You need to choose one of them. devtools makes license handling pretty
easy. Just delete your current LICENSE file and the License entry in your
DESCRIPTION file, and then call either devtools::use_mit_license() or
devtools::use_gpl3_license(). This will create the license and update the
DESCRIPTION accordingly.
You need to add ggplot2 and plyr to your Imports list in DESCRIPTION
You need to add the statements:

#' @importFrom methods is new
#' @importFrom stats median 
#' @importFrom utils heads packageVersion read.table tail write.table

Somewhere in your documentation (I usually put these lines in the same place
where I put package level documentation).

Describe each parameter in the documentation

Cheers!

DomBennett · 2018-02-13T07:57:16Z

Thanks @arendsee for the detailed and early feedback. Dealing with your specific issues:

nTaxa
The functions and classes associated with the clusters object are still a little open-ended at this stage, hence why their names may change. The equivalent to nTaxa is now getClNTx - get n taxa by cluster. We're still trying to determine the sorts of ways users will want to interrogate the clusters, and the best sort of language to use when naming functions.

I've updated the vignette. But the code is likely to change in the near future and will need changed again.

BLAST
OK. Thanks for the tips. Fixed and updated. Now using v 2.0 and up. How would you test the versioning without hard coding the values?
Check warnings

travis warnings - removed from yaml
licensing updated to MIT
ggplot2 and plyr - added
statements added to phylotaR.R
parameters are being worked and should be updated soon!

sckott · 2018-02-13T18:41:02Z

Both reviewers now assigned: @arendsee @naupaka
Due date: 2018-03-12

DomBennett · 2018-02-14T07:43:30Z

@arendsee @naupaka Thanks for taking time to review the package.

arendsee · 2018-02-14T15:26:45Z

OK. Thanks for the tips. Fixed and updated. Now using v 2.0 and up. How would you test the versioning without hard coding the values?

Hmm, good point. @sckott Any thoughts on best practices for confirming the version of an outside tool (BLAST, in this case)?

I can tell you what I do, though I am not entirely happy with my approach. I simply don't check the version of my tools, instead I just print the version output into the log (e.g. system call to blastp -version).

If you want to check the version, you could write a dedicated function to extract a version string from a version statement (maybe get_version or something), and then write a dedicated version comparison function (maybe cmp_version).

I don't really have a good general solution. But in your case, I would recommend not checking the version. There are differences between the BLAST versions, but I don't think any of the BLAST+ versions anyone will actually use (i.e. >=2.2) will change the results. Perhaps it would be worthwhile to test several versions of BLAST on Travis.

Thoughts? @sckott @naupaka?

sckott · 2018-02-14T18:21:29Z

I don't know off hand about best practices for checking versions of cli tools. Thoughts @jeroen @richfitz ?

Perhaps it would be worthwhile to test several versions of BLAST on Travis.

this seems like a really good idea - and then any changes that affect certain BLAST versions and not others will become immediately apparent

naupaka · 2018-02-14T19:57:32Z

Insofar as I can recall, the big breaking changes happened during the switch from BLAST to BLAST+ that happened ~6 years ago. So as long as the version is newer than that it should be ok. Some of the other NCBI software like sra-tools need to have more recent versions than are currently available in some apt repositories, because of changes in NCBI's servers, but I haven't seen problems with BLAST itself.

It seems (in my personal opinion) that trying to pull the source for BLAST into R is a bad idea, because then it's a ton of extra maintenance to make sure that stays up to date as things change on NCBI's side. Most folks who would use this package should probably be able to install a modern copy of ncbi-blast, so I would argue for spending effort on increasing the normal codebase and adding documentation, etc, instead of trying to pull BLAST in to remove the dependency.

arendsee · 2018-02-14T20:53:13Z

@naupaka I agree on pulling BLAST into R being a bad idea. Unless NCBI was interested in doing the work and maintaining the code.

As for the legacy BLAST versus BLAST+, the names of the executable have all changed. For example, blastp is not defined in legacy, they used blastall with some protein-vs-protein argument, I think. So if someone tried to use legacy BLAST the commands required simply would not be found.

DomBennett · 2018-03-27T09:13:22Z

Thanks very much @arendsee and @naupaka for your detailed reviews!

I feel I may have already addressed some of the run issues raised by @naupaka, as I encountered some of them over the past few weeks myself. The hanging is particularly annoying and seems to be due to Entrez, I've added a timeout so that it will retry after one hour. Also, I wonder if the BLAST errors might be due to using Windows, I've not tested the pipeline at all on that OS. Lots of things to be getting on with!

In terms of next steps @sckott , do I just update the package and report what I did here?

sckott · 2018-03-27T21:14:16Z

@DomBennett So we don't have a rule on how exactly to respond to reviewers. This is how it's best approached though I think:

Open issues in your repo that address issues raised here - Then when you reply to reviewers you can link to those - That's what I do when I bring my own pkgs through review, and I use a label or wording in the issue title to indicate its from the review, e.g. https://github.com/ropensci/charlatan/issues?q=is%3Aissue+is%3Aopen+label%3Areview What's good about opening issues is you can reference those in your commits so that everything is all linked up. It's likely that you'll want to ask for additional feedback from reviewers - so being able to link to an issue that is linked to the code itself is a nice approach

DomBennett · 2018-04-04T11:33:31Z

Hi @sckott,

I've copied your format and created issues. Also, FYI we have now submitted a manuscript describing phylotaR to MDPI Life, preprint available here.

Is there a deadline for responding to the reviewers? I was thinking of taking a break and coming back to phylotaR with a little more distance.

sckott · 2018-04-05T15:50:13Z

Not really a deadline. If you'll put it off for a while, we have a holding label that we'll use, and then we'll know to not ask you about progress until you're ready to come back to it

DomBennett · 2018-06-16T17:18:06Z

Hi @sckott, @arendsee and @naupaka,

I’ve made the following major changes as per your comments and reviews:

Restyled: All functions and variables are snake case with the exception of the S4 classes that remain camelcase. I’ve also updated the names to make things a little more explicit (e.g. ClstrRec rather than ClRec).
Tested: I now have 89% coverage
Examples: All ‘public’, exported functions and data have examples.
Pkgdown: I have created a website.
Documentation: all functions have return parameters, I have checked spelling with spell_check.
Timings: The demonstration phylotaR runs now have the timings of each stage on the README.md
Contributing: There is now a CONTRIBUTING.md
Logging: There is now more salient information printed to screen, blast versions and session info are also recorded in a user’s wd.
Windows: The pipeline has been run on a windows machine
Makeblastdb error: I think this was due to a ‘~’ in @naupaka’s filepath. I now halt the pipeline if a user’s does this.

Things I have not done:

Additional packages: I have not incorporated taxize, taxizeDB, tm or Biostrings into the package. In the case of the latter, I feel it best to avoid Bioconductor packages as they are a little more tricky to install and I would like to keep phylotaR light, even though it requires BLAST+. In the case of the others, I see no immediate reason to make use of these additional packages while I currently have no need for their additional capabilities. I have fixed the 80 column width issue of the user generated .fasta output. In the future, as new features are requested (e.g. alternative taxonomies), then they should be incorporated. Right now though, I feel I should follow the doctrine of if it ain’t broke, why fix it?
Advanced vignette: I have not added this as a lot of the additional information can be found in the recently published article on the pipeline. In the future I can imagine a new vignette containing more detailed information on new features with more complex options. This is likely to occur since I have been contacted a fair number of times with requests for new options and features.

If any of you feel that these last two points should be acted upon in some form, then I will be happy to look at these again.

Many thanks once more for taking the time to review the package!

Best wishes,
Dom

DomBennett · 2018-06-16T17:22:16Z

Oh also... I should add that I cannot rename the first stage 'taxise' as it is already published (I interpret taxise to just mean grab taxonomic information or place in a taxonomic context.). To compromise I have updated the stage diagram in the README to be a more explicit.

naupaka · 2018-06-16T21:44:45Z

Sounds great @DomBennett!

@sckott shall we have another go through of it with the new version? What's the ROpenSci SOP on review after revisions?

sckott · 2018-06-20T00:03:53Z

thanks @DomBennett !

@naupaka @arendsee

If you're completely happy with changes to the package and responses from the maintainer then just say so. If not, please do comment here with any additional comments/suggestions/etc.

arendsee · 2018-06-20T13:52:30Z

@DomBennett @sckott Looks good to me.

I would recommend eventually refactoring with taxize et al, but I wouldn't consider it a necessity.

Good job!

sckott · 2018-06-20T15:43:39Z

@naupaka do you want to make further comments, or are you happy?

naupaka · 2018-06-20T16:16:06Z

I'd like to put the new version through its paces and haven't had the time to sit down and do it. Will get back to you in the next couple days if that's ok?

sckott · 2018-06-20T16:19:58Z

@naupaka sounds good

naupaka · 2018-07-08T00:34:29Z

@DomBennett Just ran through a couple testing workflows and everything works smoothly for me! Thanks for this very useful tool.

naupaka · 2018-07-08T00:34:56Z

@sckott I'm happy to give this my stamp of approval 👍

sckott · 2018-07-09T16:50:08Z

Great, thanks again @arendsee and @naupaka for your reviews!

sckott · 2018-07-09T17:51:03Z

@DomBennett we're nearly there. just a few comments:

On running the example from the vignette, I get an error:

---------------------------------------------------
Running pipeline on [unix] at [2018-07-09 10:22:31]
---------------------------------------------------
Running stages: taxise, download, cluster, cluster2
--------------------------------------------
Starting stage TAXISE: [2018-07-09 10:22:31]
--------------------------------------------
Searching taxonomic IDs ...
Downloading taxonomic records ...
. [1-21]
Generating taxonomic dictionary ...
---------------------------------------------
Completed stage TAXISE: [2018-07-09 10:22:33]
---------------------------------------------
----------------------------------------------
Starting stage DOWNLOAD: [2018-07-09 10:22:33]
----------------------------------------------
Identifying suitable clades ...
Identified [1] suitable clades.
Downloading hierarchically ...
Working on parent [id 9504]: [1/1] ...
. + whole subtree ...
. . Getting [2700 sqs] ...
. . . [1-300]
Unexpected Error : NCBI is limiting the size of your request. Consider reducing btchsz.


Occurred [2018-07-09 10:22:58]
Contact package maintainer for help.
Error in stages_run(wd = wd, frm = 1, to = nstages, stgs_msg = stgs_msg) :
  Unexpected Error : NCBI is limiting the size of your request. Consider reducing btchsz.


Occurred [2018-07-09 10:22:58]
Contact package maintainer for help.

It looks like I need to fiddle with btchsz but I'm not sure where to do that. Would it make sense to say what function has that parameter? It looks like it's the function parameters()?

running goodpractice again: i'd encourage you to avoid sapply where possible and avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) type expressions where possible, but not there aren't required before approval.

DomBennett · 2018-07-09T19:27:10Z

Thanks again to @arendsee and @naupaka for their helpful reviews! So nice to see this package improve with all your helpful suggestions and feedback.

@sckott, as per your points ....

The default btchsz is 300 records per request. I find this is OK initially but if you use Entrez a lot they seem to impose a limit, changing it to 100 seems to always work. I will make 100 the default instead.

To change parameters you can either delete the folder and set up the folder again with different parameter values using the setup(). Or you can use parameters_reset() to change an already set-up folder. I will add a section to the vignette on editing the parameters and point users to the appropriate function in the error message.

Will run goodpractice myself again and avoid sapply and the 1: stuff.

After I've made those changes I will ping you again here.

DomBennett · 2018-07-12T08:28:24Z

@sckott changes made!

No more sapply and 1:
More 'how-to-change-the-parameters' documentation, including: https://antonellilab.github.io/phylotaR/articles/phylotaR.html#changing-parameters

sckott · 2018-07-12T17:50:47Z

Approved! Thanks again for your submission @DomBennett

Please transfer the package to ropensci- I've invited you to a team within our ropensci organization. After you accept you should be able to move it. You'll be made admin once you do.
Make sure to
- add rOpenSci footer to README
  [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
- Change any needed links, such those for CI badges
- Travis CI should update to the new location automatically - you may have to update other CI systems manually
In a new .github folder in the root of the pkg, add a CONTRIBUTING.md file, an issue template file, and a PR template file, see https://github.com/ropensci/dotgithubfiles/ for examples, you can copy them directly and edit as you wish
We're starting to roll out software metadata files to all ropensci pkgs via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your pkg, after installing the pkg - should be easy as running codemetar::write_codemeta() in the root of your pkg

Are you interested in doing a blog post for our blog https://ropensci.org/blog/ ? either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development (https://ropensci.org/blog/). If so, we'll have our community manager @stefaniebutland get in touch with you on that

DomBennett · 2018-07-13T16:15:47Z

Done!
https://github.com/ropensci/phylotaR

sckott · 2018-07-13T17:23:28Z

@DomBennett and the question about a blog post?

stefaniebutland · 2018-07-18T05:33:27Z

@DomBennett this link will give you many examples of blog posts by authors of onboarded packages so you can get an idea of the style and length: https://ropensci.org/tags/review/. Technotes are here: https://ropensci.org/technotes/.

Here are some technical and editorial guidelines for contributing a blog post: https://github.com/ropensci/roweb2#contributing-a-blog-post.

Happy to answer any questions.

sckott added package 0/presubmission labels Jan 30, 2018

sckott self-assigned this Jan 30, 2018

sckott added topic:taxonomy topic:phylogenetics topic:data-access and removed 0/presubmission labels Jan 30, 2018

sckott changed the title ~~phylotaR -- presubmission~~ phylotaR -- submission Jan 30, 2018

sckott added 1/editor-checks 2/seeking-reviewer(s) and removed 1/editor-checks labels Jan 31, 2018

sckott added 3/reviewer(s)-assigned and removed 2/seeking-reviewer(s) labels Feb 13, 2018

sckott removed the 3/reviewer(s)-assigned label Mar 26, 2018

sckott added the holding label Apr 6, 2018

sckott added 5/awaiting-reviewer(s)-response and removed 4/review(s)-in-awaiting-changes holding labels Jul 9, 2018

sckott added 6/approved and removed 5/awaiting-reviewer(s)-response labels Jul 12, 2018

sckott closed this as completed Jul 24, 2018

This issue was closed.

phylotaR -- submission #187

phylotaR -- submission #187

Comments

DomBennett commented Jan 23, 2018 • edited Loading

Summary

Requirements

Publication options

Detail

Pre-submission Note

sckott commented Jan 30, 2018

sckott commented Jan 30, 2018 • edited Loading

Editor checks:

Editor comments

rvosa commented Jan 31, 2018

sckott commented Jan 31, 2018

rvosa commented Jan 31, 2018 via email

sckott commented Jan 31, 2018

sckott commented Feb 6, 2018

DomBennett commented Feb 7, 2018

sckott commented Feb 7, 2018

arendsee commented Feb 12, 2018

DomBennett commented Feb 13, 2018 • edited Loading

sckott commented Feb 13, 2018

DomBennett commented Feb 14, 2018

arendsee commented Feb 14, 2018

sckott commented Feb 14, 2018

naupaka commented Feb 14, 2018

arendsee commented Feb 14, 2018

DomBennett commented Mar 27, 2018

sckott commented Mar 27, 2018

DomBennett commented Apr 4, 2018

sckott commented Apr 5, 2018

DomBennett commented Jun 16, 2018

DomBennett commented Jun 16, 2018

naupaka commented Jun 16, 2018

sckott commented Jun 20, 2018

arendsee commented Jun 20, 2018

sckott commented Jun 20, 2018

naupaka commented Jun 20, 2018

sckott commented Jun 20, 2018

naupaka commented Jul 8, 2018

naupaka commented Jul 8, 2018

sckott commented Jul 9, 2018

sckott commented Jul 9, 2018

DomBennett commented Jul 9, 2018

DomBennett commented Jul 12, 2018

sckott commented Jul 12, 2018

DomBennett commented Jul 13, 2018

sckott commented Jul 13, 2018

stefaniebutland commented Jul 18, 2018

DomBennett commented Jan 23, 2018 •

edited

Loading

sckott commented Jan 30, 2018 •

edited

Loading

DomBennett commented Feb 13, 2018 •

edited

Loading