-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phylotaR -- submission #187
Comments
thanks for your submission @DomBennett we've been discussing and will get back to you asap |
Editor checks:
Editor commentsThanks for your submission @DomBennett !!
── GP phylotaR ────────────
It is good practice to
✖ write unit tests for all functions, and all package code
in general. 75% of code lines are covered by test cases.
R/base-wrappers.R:7:NA
R/blast-tools.R:10:NA
R/blast-tools.R:16:NA
R/blast-tools.R:28:NA
R/blast-tools.R:34:NA
... and 325 more lines
✖ not use "Depends" in DESCRIPTION, as it can cause name
clashes, and poor interaction with other packages. Use "Imports"
instead.
✖ omit "Date" in DESCRIPTION. It is not required and it
gets invalid quite often. A build date will be added to the package
when you perform `R CMD build` on it.
✖ use '<-' for assignment instead of '='. '<-' is the
standard, and R users and developers are used it and it is easier
to read your code for them if you use '<-'.
R/ncbi-tools.R:156:9
✖ avoid long code lines, it is bad for readability. Also,
many people prefer editor windows that are about 80 characters
wide. Try make your lines shorter than 80 characters
R/blast-tools.R:59:1
R/taxise-tools.R:361:1
✖ omit trailing semicolons from code lines. They are not
needed and most R coding standards forbid them
R/ncbi-tools.R:147:55
✖ avoid sapply(), it is not type safe. It might return a
vector, or a list, depending on the input data. Consider using
vapply() instead.
R/blast-tools.R:32:13
R/blast-tools.R:33:11
R/cache-tools.R:253:13
R/cache-tools.R:256:13
R/cache-tools.R:317:12
... and 32 more lines
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
result 1:0 if the expression on the right hand side is zero. Use
seq_len() or seq_along() instead.
R/cluster-tools.R:345:12
R/ncbi-tools.R:162:20
R/pipeline.R:208:12
tests/testthat/test-cluster-tools.R:123:17
tests/testthat/test-download-tools.R:81:17
... and 2 more lines
✖ fix this R CMD check WARNING: '::' or ':::' imports not
declared from: ‘ggplot2’ ‘plyr’ Namespaces in Imports field not
imported from: ‘DBI’ ‘taxize’ All declared Imports should be used.
Packages in Depends field not imported from: ‘foreach’ ‘methods’
These packages need to be imported from (in the NAMESPACE file) for
when this namespace is loaded but not attached. Missing or
unexported object: ‘doSNOW::registerDoMC’
✖ fix this R CMD check NOTE: blstN: no visible binding for
global variable ‘cmd’ blstN: no visible global function definition
for ‘read.table’ byLineage : .calc : <anonymous>: no visible
binding for global variable ‘value’ fltr: no visible global
function definition for ‘head’ fltr: no visible global function
definition for ‘tail’ genClstrsObj: no visible global function
definition for ‘new’ getBestClstrs: no visible global function
definition for ‘nTaxa’ getBestClstrs: no visible global function
definition for ‘seqLength’ getClMdLn : <anonymous>: no visible
global function definition for ‘median’ getMngblIds: no visible
global function definition for ‘head’ getMngblIds: no visible
global function definition for ‘tail’ mkBlstDB: no visible binding
for global variable ‘cmd’ nDscndnts: no visible global function
definition for ‘%dopar%’ nDscndnts: no visible global function
definition for ‘foreach’ nDscndnts: no visible binding for global
variable ‘i’ plotTable: no visible binding for global variable
‘clstrnm’ plotTable: no visible binding for global variable
‘txidnm’ plotTable: no visible binding for global variable ‘value’
runStgs: no visible global function definition for ‘is’ safeSrch:
no visible global function definition for ‘is’ setUp: no visible
global function definition for ‘packageVersion’ writeClstr: no
visible global function definition for ‘write.table’ writeSqs: no
visible binding for global variable ‘clstrs_obj’ writeTax: no
visible global function definition for ‘write.table’
summary,ClstrsObj: no visible binding for global variable ‘x’
Undefined global functions or variables: %dopar% clstrnm clstrs_obj
cmd foreach head i is median nTaxa new packageVersion read.table
seqLength tail txidnm value write.table x Consider adding
importFrom("methods", "is", "new") importFrom("stats", "median")
importFrom("utils", "head", "packageVersion", "read.table", "tail",
"write.table") to your NAMESPACE file (and ensure that your
DESCRIPTION Imports field contains 'methods').
✖ fix this R CMD check WARNING: Undocumented data sets:
‘blst_rs’ ‘phylt_nds’ ‘sqs’ ‘tdobj’ All user-level objects in a
package should have documentation entries. See chapter ‘Writing R
documentation files’ in the ‘Writing R Extensions’ manual.
✖ fix this R CMD check NOTE: Warning: no function found
corresponding to methods exports from ‘phylotaR’ for: ‘show’ A
namespace must be able to be loaded with just the base namespace
loaded: otherwise if the namespace gets loaded by a saved object,
the session will be unable to start. Probably some imports need to
be declared in the NAMESPACE file. You don't have to address these now. Seeking reviewers now 🕐 |
@sckott - what would you recommend for a journal to publish a short application note about this package? |
@rvosa possibly: we have a sort of pairing with JOSS and MEE. Also maybe: The R Journal, F1000Research |
Thanks! Sorry, what's JOSS?
…On Wed, Jan 31, 2018 at 8:33 PM, Scott Chamberlain ***@***.*** > wrote:
@rvosa <https://github.com/rvosa> possibly: we have a sort of pairing
with JOSS and MEE. Also maybe: The R Journal, F1000Research
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#187 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAGf-rOdGkuculjA9u_JE-FBSAS07aMuks5tQL_1gaJpZM4Rpm8k>
.
|
@DomBennett I had a number of questions for you above in my comments. Please respond |
@sckott Sorry, I thought I could wait for the reviewers and respond to everything wholesale. You didn't check the top level MEE box - i've checked that box. You didn't check the box for The manuscript describing the package is no longer than 3000 words - will manuscript length be a problem? We hadn't decided at that point which journal to send it to. We're not likely to submit to MEE now. All MEE boxes have been unticked. You didn't check the box for has a CRAN and OSI accepted license. I think you do, can you check that box? It does now. I added an MIT license. You didn't check the box for does not violate the Terms of Service of any service it interacts with. Is that going to be a problem? Can you check on that? I've read the terms now and it won't. Box ticked. I see you checked the submit to MEE box. Note that we can not make any gaurantee about MEE considering your paper a good fit for them, or them accepting the paper. It'd be nice if you did guarantee! But we're not going with MEE.... Seems like blast on Homebrew (installed via brew install blast) is the same as at the NCBI link you gave at https://www.ncbi.nlm.nih.gov/books/NBK279671/ - Is that right? homebrew is for OSX platforms only It's just the link to the general description of how to install BLAST tools across all systems. Not specific to OSX. It would indeed be ideal if the pkg didn't have to use system calls to external tools, but there are times when there's no other way. Yeah it sucks big time. We're currently working on an approach to avoid using standalone BLAST, in which case a user would not need to install it. The BLAST tools are written in C++, so in theory it should be possible to package the code into the R package. I'm not too au fait on how to do this though. I've written external C scripts into R packages before, but never have I taken someone else's C code entirely. There may be problems with modifying the C++ code to interact with the R. Do you have any ideas on workarounds? Otherwise we could simply give the user the option to send their BLAST jobs to online servers (e.g. NCBI or EBI). In terms of style, we prefer snake case but as long as you are consistent within the package that is THE most important thing (i.e, don't use a combination of camelCase, TitleCase and snake_case). Yeah I know..... sigh..... Coding style can be real bugbears. I try and use a sort of PEP8 style with a mix of all three (variables snake_case, functions camelCase and classes TitleCase). I can go through and make it all snake_case. I'll probably rechristen some elements where I think the names don't quite work. You did not check the box for Does the package conform to rOpenSci packaging guidelines? Can you do that and describe any exceptions Not just yet. See above. Also I need to go through your more specific recommendations above as well. Do I need to make these changes before the reviewers look over the package? It's just the pipeline is still being tested and developed and it may be a while before these changes can be made. You can put a ropensci review badge in your README Dun'd. |
Okay, sounds good.
thanks.
thanks.
Glad you're looking into some options. It's possible reviewers will have some ideas as well. One idea: if you do shell out, you could use the
If you're talking about the |
Hi @DomBennett, I'm working on reviewing
> clstr_id <- getBestClstrs(clstrs_obj, n=1)
Error in nTaxa(clstrs_obj) : could not find function "nTaxa"
Currently the BLAST version requirement is hard coded to greater than 2.7.1
# setup-tools.R:118
tst <- vrsn[1] >= 2 & vrsn[2] >= 7 & vrsn[3] >= 1 However, this would lead to versions such as # setup-tools.R:118
tst <- vrsn[1] >= 2 & vrsn[2] >= 7 Also, in DESCRIPTION, the
When I run
#' @importFrom methods is new
#' @importFrom stats median
#' @importFrom utils heads packageVersion read.table tail write.table Somewhere in your documentation (I usually put these lines in the same place
Cheers! |
Thanks @arendsee for the detailed and early feedback. Dealing with your specific issues:
I've updated the vignette. But the code is likely to change in the near future and will need changed again.
|
Hmm, good point. @sckott Any thoughts on best practices for confirming the version of an outside tool (BLAST, in this case)? I can tell you what I do, though I am not entirely happy with my approach. I simply don't check the version of my tools, instead I just print the version output into the log (e.g. system call to If you want to check the version, you could write a dedicated function to extract a version string from a version statement (maybe I don't really have a good general solution. But in your case, I would recommend not checking the version. There are differences between the BLAST versions, but I don't think any of the BLAST+ versions anyone will actually use (i.e. >=2.2) will change the results. Perhaps it would be worthwhile to test several versions of BLAST on Travis. |
I don't know off hand about best practices for checking versions of cli tools. Thoughts @jeroen @richfitz ?
this seems like a really good idea - and then any changes that affect certain BLAST versions and not others will become immediately apparent |
Insofar as I can recall, the big breaking changes happened during the switch from BLAST to BLAST+ that happened ~6 years ago. So as long as the version is newer than that it should be ok. Some of the other NCBI software like sra-tools need to have more recent versions than are currently available in some apt repositories, because of changes in NCBI's servers, but I haven't seen problems with BLAST itself. It seems (in my personal opinion) that trying to pull the source for BLAST into R is a bad idea, because then it's a ton of extra maintenance to make sure that stays up to date as things change on NCBI's side. Most folks who would use this package should probably be able to install a modern copy of ncbi-blast, so I would argue for spending effort on increasing the normal codebase and adding documentation, etc, instead of trying to pull BLAST in to remove the dependency. |
@naupaka I agree on pulling BLAST into R being a bad idea. Unless NCBI was interested in doing the work and maintaining the code. As for the legacy BLAST versus BLAST+, the names of the executable have all changed. For example, |
Thanks very much @arendsee and @naupaka for your detailed reviews! I feel I may have already addressed some of the run issues raised by @naupaka, as I encountered some of them over the past few weeks myself. The hanging is particularly annoying and seems to be due to Entrez, I've added a timeout so that it will retry after one hour. Also, I wonder if the BLAST errors might be due to using Windows, I've not tested the pipeline at all on that OS. Lots of things to be getting on with! In terms of next steps @sckott , do I just update the package and report what I did here? |
@DomBennett So we don't have a rule on how exactly to respond to reviewers. This is how it's best approached though I think: Open issues in your repo that address issues raised here - Then when you reply to reviewers you can link to those - That's what I do when I bring my own pkgs through review, and I use a label or wording in the issue title to indicate its from the review, e.g. https://github.com/ropensci/charlatan/issues?q=is%3Aissue+is%3Aopen+label%3Areview What's good about opening issues is you can reference those in your commits so that everything is all linked up. It's likely that you'll want to ask for additional feedback from reviewers - so being able to link to an issue that is linked to the code itself is a nice approach |
Hi @sckott, I've copied your format and created issues. Also, FYI we have now submitted a manuscript describing phylotaR to MDPI Life, preprint available here. Is there a deadline for responding to the reviewers? I was thinking of taking a break and coming back to phylotaR with a little more distance. |
Not really a deadline. If you'll put it off for a while, we have a holding label that we'll use, and then we'll know to not ask you about progress until you're ready to come back to it |
Hi @sckott, @arendsee and @naupaka, I’ve made the following major changes as per your comments and reviews:
Things I have not done:
If any of you feel that these last two points should be acted upon in some form, then I will be happy to look at these again. Many thanks once more for taking the time to review the package! Best wishes, |
Oh also... I should add that I cannot rename the first stage 'taxise' as it is already published (I interpret taxise to just mean grab taxonomic information or place in a taxonomic context.). To compromise I have updated the stage diagram in the README to be a more explicit. |
Sounds great @DomBennett! @sckott shall we have another go through of it with the new version? What's the ROpenSci SOP on review after revisions? |
thanks @DomBennett ! If you're completely happy with changes to the package and responses from the maintainer then just say so. If not, please do comment here with any additional comments/suggestions/etc. |
@DomBennett @sckott Looks good to me. I would recommend eventually refactoring with Good job! |
@naupaka do you want to make further comments, or are you happy? |
I'd like to put the new version through its paces and haven't had the time to sit down and do it. Will get back to you in the next couple days if that's ok? |
@naupaka sounds good |
@DomBennett Just ran through a couple testing workflows and everything works smoothly for me! Thanks for this very useful tool. |
@sckott I'm happy to give this my stamp of approval 👍 |
@DomBennett we're nearly there. just a few comments: On running the example from the vignette, I get an error: ---------------------------------------------------
Running pipeline on [unix] at [2018-07-09 10:22:31]
---------------------------------------------------
Running stages: taxise, download, cluster, cluster2
--------------------------------------------
Starting stage TAXISE: [2018-07-09 10:22:31]
--------------------------------------------
Searching taxonomic IDs ...
Downloading taxonomic records ...
. [1-21]
Generating taxonomic dictionary ...
---------------------------------------------
Completed stage TAXISE: [2018-07-09 10:22:33]
---------------------------------------------
----------------------------------------------
Starting stage DOWNLOAD: [2018-07-09 10:22:33]
----------------------------------------------
Identifying suitable clades ...
Identified [1] suitable clades.
Downloading hierarchically ...
Working on parent [id 9504]: [1/1] ...
. + whole subtree ...
. . Getting [2700 sqs] ...
. . . [1-300]
Unexpected Error : NCBI is limiting the size of your request. Consider reducing btchsz.
Occurred [2018-07-09 10:22:58]
Contact package maintainer for help.
Error in stages_run(wd = wd, frm = 1, to = nstages, stgs_msg = stgs_msg) :
Unexpected Error : NCBI is limiting the size of your request. Consider reducing btchsz.
Occurred [2018-07-09 10:22:58]
Contact package maintainer for help. It looks like I need to fiddle with running goodpractice again: i'd encourage you to avoid |
Thanks again to @arendsee and @naupaka for their helpful reviews! So nice to see this package improve with all your helpful suggestions and feedback. @sckott, as per your points .... The default To change parameters you can either delete the folder and set up the folder again with different parameter values using the Will run goodpractice myself again and avoid sapply and the After I've made those changes I will ping you again here. |
@sckott changes made!
|
Approved! Thanks again for your submission @DomBennett
|
@DomBennett and the question about a blog post? |
@DomBennett this link will give you many examples of blog posts by authors of onboarded packages so you can get an idea of the style and length: https://ropensci.org/tags/review/. Technotes are here: https://ropensci.org/technotes/. Here are some technical and editorial guidelines for contributing a blog post: https://github.com/ropensci/roweb2#contributing-a-blog-post. Happy to answer any questions. |
Summary
R implementation of the PhyLoTa pipeline. It identifies orthologous sequences in GenBank using BLAST searches. It consists of three automated stages: taxise, download and cluster.
URL for the package (the development repository, not a stylized html page): https://github.com/DomBennett/phylotaR
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
data retrieval, it downloads sequences from genbank and identifies sequence clusters that may be of use for phylogenetic analysis
Evolutionary biologists, phylogeneticists. Determining sequence orthology is the essential first step of any phylogenetic analysis. Usually a phylogeneticist may search GenBank for sequences of the same gene, download and align. Due to naming problems, however, potential relevant sequences may be missed or sequences that have been mislabelled are downloaded. PhyLoTa gets around this by performing all-v-all BLAST searches and does not rely on gene names. PhyLoTa is often the first step of any large-scale automated phylogenetic pipeline (e.g. SUPERSMART). Potentially, this R package could be central to many future phylogenetic analyses.
yours differ or meet our criteria for best-in-category?
The only current approach for identifying PhyLoTa clusters is the PhyLoTa browser. The data underlying its current release, however, is five years old. In those five years there have been over 40 million new sequences added to GenBank. An alternative is needed to make use of the latest data. Additionally, phylotaR would allow a user to specify their own parameters when running the pipeline.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:No errors, but plenty of warnings to do with documentation.
Not just yet. Outstanding: conversion to snake_case, generate README from rmd, create website with pkgdown, NEWS required, authors@R syntax, depends to imports and cat to message (although is that necessary given the logging methods?)
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Pre-submission Note
Hi!
I've produced this pre-submission issue to determine whether phylotaR would be something you'd like added to your roster? On our side, we would be very keen as we think future phylogenetic analyses will depend on making software open-source, modular and collaborative. I did feel, however, that it may not quite fit into the ropensci family, because it's a pipeline and it depends on software external to R (BLAST). Please be aware that phylotaR is still only being developed and a lot of testing still needs to be performed. If you are interested, I'd be happy to update the naming style (I prefer camelcase), improve the documentation and whatever else is needed to meet your criteria.
Thanks,
Dom
The text was updated successfully, but these errors were encountered: