Reimplement and improve CLI #224

mbhall88 · 2020-09-01T02:18:39Z

This PR moves the handling of the Command-line interface (CLI) to CLI11 and closes #53.

Additionally, some changes in this PR are not backwards compatible as some options have been renamed/removed.

This PR also contains a hotfix that closes #226

mbhall88 · 2020-09-01T02:27:42Z

Index

Old

$ pandora index -h
Usage: pandora index [options] <prgs.fa>
Options:
        -h,--help                       Show this help message
        -w W                            Window size for (w,k)-minimizers, default 14
        -k K                            K-mer size for (w,k)-minimizers, default 15
        -t T                            Number of threads, default 1
        --offset                                Offset for PRG ids, default 0
        --outfile                               Filename for index
        --log_level                     debug,[info],warning,error

New

$ pandora index -h
Index population reference graph (PRG) sequences.
Usage: pandora index [OPTIONS] <PRG>

Positionals:
  <PRG> FILE [required]       PRG to index (in fasta format)

Options:
  -h,--help                   Print this help message and exit
  -w INT                      Window size for (w,k)-minimizers (must be <=k) [default: 14]
  -k INT                      K-mer size for (w,k)-minimizers [default: 15]
  -t,--threads INT            Maximum number of threads to use [default: 1]
  -o,--outfile FILE           Filename for the index [default: <PRG>.k15.w14.idx]
  -v                          Verbosity of logging. Repeat for increased verbosity

The only point of discussion here I think (which is the same across all subcommands) is the change to the log level indication. Instead of asking the user to state the level explicitly (i.e. info, debug etc.) it is assumed as info and if they want more logging they pass -v to indicate debug and -vv to indicate trace.
Note: this change also addresses and closes #160

Another nice thing about using CLI11 is there are some nice built-in validators. For this subcomand, if<PRG> does not exist, CLI11 throws an exception for us.

mbhall88 · 2020-09-01T03:34:19Z

Map

Old

$ pandora map -h
only provided 2arguments
Usage: pandora map -p PRG_FILE -r READ_FILE <option(s)>
Options:
        -h,--help                       Show this help message
        -p,--prg_file PRG_FILE          Specify a fasta-style prg file
        -r,--read_file READ_FILE        Specify a file of reads in fasta format
        -o,--outdir OUTDIR      Specify directory of output
        -w W                            Window size for (w,k)-minimizers, must be <=k, default 14
        -k K                            K-mer size for (w,k)-minimizers, default 15
        -m,--max_diff INT               Maximum distance between consecutive hits within a cluster, default 500 (bps)
        -e,--error_rate FLOAT           Estimated error rate for reads, default 0.11
        -t T                            Number of threads, default 1
        --genome_size   NUM_BP  Estimated length of genome, used for coverage estimation
        --output_kg                     Save kmer graphs with fwd and rev coverage annotations for found localPRGs
        --output_vcf                    Save a vcf file for each found localPRG
        --vcf_refs REF_FASTA            A fasta file with an entry for each LocalPRG giving reference sequence for
                                        VCF. Must have a perfect match in the graph and the same name as the graph
        --output_comparison_paths       Save a fasta file for a random selection of paths through localPRG
        --output_covgs  Save a file of covgs for each localPRG present, one number per base of fasta file
        --output_mapped_read_fa Save a file for each gene containing read parts which overlapped it
        --illumina                      Data is from illumina rather than nanopore, so is shorter with low error rate
        --clean                 Add a step to clean and detangle the pangraph
        --bin                   Use binomial model for kmer coverages, default is negative binomial
        --max_covg                      Maximum average coverage from reads to accept
        --genotype MODE                 Add extra step to carefully genotype sites. Has two modes: global (ML path oriented) or local (coverage oriented)
        --snps_only                     When genotyping, include only snp sites
        -d,--discover                   Add denovo discovery
        --denovo_kmer_size                      Kmer size to use for denovo discovery
        --log_level                     debug,[info],warning,error

New

$ pandora map -h
Quasi-map reads to an indexed PRG, infer the sequence of present loci in the sample, and (optionally) genotype/discover variants.
Usage: ./pandora map [OPTIONS] <TARGET> <QUERY>

Positionals:
  <TARGET> FILE [required]    An indexed PRG file (in fasta format)
  <QUERY> FILE [required]     Fast{a,q} file containing reads to quasi-map

Options:
  -h,--help                   Print this help message and exit
  -v                          Verbosity of logging. Repeat for increased verbosity

Indexing:
  -w INT                      Window size for (w,k)-minimizers (must be <=k) [default: 14]
  -k INT                      K-mer size for (w,k)-minimizers [default: 15]

Input/Output:
  -o,--outdir DIR             Directory to write output files to [default: pandora]
  -t,--threads INT            Maximum number of threads to use [default: 1]
  --vcf-refs FILE             Fasta file with a reference sequence to use for each loci. The sequence MUST have a perfect match in <TARGET> and the same name
  --kg                        Save kmer graphs with forward and reverse coverage annotations for found loci
  --loci-vcf                  Save a VCF file for each found loci
  -C,--comparison-paths       Save a fasta file for a random selection of paths through loci
  --coverages                 Save a file of coverages for each loci present - one number per base
  -M,--mapped-reads           Save a fasta file for each loci containing read parts which overlapped it

Parameter Estimation:
  -e,--error-rate FLOAT       Estimated error rate for reads [default: 0.11]
  -g,--genome-size INT        Estimated length of the genome - used for coverage estimation [default: 5000000]
  --bin                       Use binomial model for kmer coverages [default: negative binomial]

Mapping:
  -m,--max-diff INT           Maximum distance (bp) between consecutive hits within a cluster [default: 250]
  -c,--min-cluster-size INT   Minimum size of a cluster of hits between a read and a loci to consider the loci present [default: 10]

Preset:
  -I,--illumina               Reads are from Illumina. Alters error rate used and adjusts for shorter reads

Filtering:
  --clean                     Add a step to clean and detangle the pangraph
  --max-covg INT              Maximum average coverage of reads to accept [default: 300]

Consensus/Variant Calling:
  --genotype                  Add extra step to carefully genotype sites.
  --snps                      When genotyping, only include SNP sites
  -d,--discover               Add a step to discover de novo variants
  --discover-k INT            Kmer size to use when disovering de novo variants [default: 11]
  --kmer-avg INT              Maximum number of kmers to average over when selecting the maximum likelihood path [default: 100]

Genotyping:
  --local Needs: --genotype   (Intended for developers) Use coverage-oriented (local) genotyping instead of the default ML path-oriented (global) approach.
  -a INT                      Hard threshold for the minimum allele coverage allowed when genotyping [default: 0]
  -s INT                      The minimum required total coverage for a site when genotyping [default: 0]
  -D INT                      Minimum difference in coverage on a site required between the first and second maximum likelihood path [default: 0]
  -F INT                      Minimum allele coverage, as a fraction of the expected coverage, allowed when genotyping [default: 0]
  -E,--gt-error-rate FLOAT    When genotyping, assume that coverage on alternative alleles arises as a result of an error process with rate -E. [default: 0.01]
  -G,--gt-conf INT            Minimum genotype confidence (GT_CONF) required to make a call [default: 1]

There is a lot to address here. In particular, I have added "Should this be exposed?" to some options. If we do want these options exposed, then a description is needed for them. Additionally, I would like to shorten a lot of these options as some are far too verbose. I have left two options without a description - --confidence-threshold and --genotyping-error-rate as I struggled to determine what these do from the code.

One outstanding thing is to create groups within the menu (similar to tools like minimap2 and bwa) so we would have something like "Input/Output Options", "Mapping", "Preset" etc. Please feel free to suggest groups. I will wait until we have settled on what options should be removed (if any).

Two breaking changes of importance to note:

removing the snakecase convention in options in favour of the more widely used "kebab case". i.e. --confidence_threshold becomes --confidence-threshold. If this is a big issue I will change it back, but I really dislike snake case in CLI opts.
the PRG and reads files are now positional arguments <TARGET> and <QUERY> respectively.

This change to CLI11 for map also closes #140

Lastly, I have added 'todo' comments above any options that I have questions or concerns about (map_main.cpp function setup_map_subcommand) so could we please discuss those in the code review? In particular, there is a comment on line 123 relating to a discussion @iqbal-lab and I had about whether we just move purely to global genotyping.

mbhall88 · 2020-09-01T03:46:54Z

Main

Old

$ pandora --help
[main] unrecognized command '--help'

New

$ pandora --help
Pandora: Pan-genome inference and genotyping with long noisy or short accurate reads.
Usage: pandora [OPTIONS] SUBCOMMAND

Options:
  -h,--help                   Print this help message and exit

Subcommands:
  index                       Index population reference graph (PRG) sequences.
  map                         Quasi-map reads to an indexed PRG, infer the sequence of present loci in the sample, and (optionally) genotype/discover variants.
  compare                     Quasi-map reads from multiple samples to an indexed PRG, infer the sequence of present loci in each sample, and call variants between the samples.
  walk                        Outputs a path through the nodes in a PRG corresponding to the either an input sequence (if it exists) or the top/bottom path
  seq2path                    For each sequence, return the path through the PRG
  get_vcf_ref                 Outputs a fasta suitable for use as the VCF reference using input sequences
  random                      Outputs a fasta of random paths through the PRGs
  merge_index                 Allows multiple indicies to be merged (no compatibility check)

Note: I will update this comment as I add more subcommands.
Closes #159

mbhall88 · 2020-09-01T06:29:29Z

Compare

Old

$ pandora compare -h
Usage: pandora compare -p PRG_FILE -r READ_INDEX -o OUTDIR <option(s)>
Options:
        -h,--help                       Show this help message
        -p,--prg_file PRG_FILE          Specify a fasta-style prg file
        -r,--read_index READ_INDEX      Specify a file with a line per sample
                                        sample_id <tab> filepath to reads in fasta/q format
        -o,--outdir OUTDIR      Specify directory of output
        -w W                            Window size for (w,k)-minimizers, default 14
        -k K                            K-mer size for (w,k)-minimizers, default 15
        -m,--max_diff INT               Maximum distance between consecutive hits within a cluster, default 250 (bps)
        -e,--error_rate FLOAT           Estimated error rate for reads, default 0.11
        -t T                            Number of threads, default 1
        --genome_size   NUM_BP  Estimated length of genome, used for coverage estimation
        --vcf_refs REF_FASTA            A fasta file with an entry for each LocalPRG giving reference sequence for
                                        VCF. Must have a perfect match in the graph and the same name as the graph
        --illumina                      Data is from illumina rather than nanopore, so is shorter with low error rate
        --clean                 Add a step to clean and detangle the pangraph
        --bin                   Use binomial model for kmer coverages, default is negative binomial
        --max_covg                      Maximum average coverage from reads to accept
        --genotype MODE                 Add extra step to carefully genotype sites. Has two modes: global (ML path oriented) or local (coverage oriented)
        --log_level                     debug,[info],warning,error

New

$ pandora compare -h
Quasi-map reads from multiple samples to an indexed PRG, infer the sequence of present loci in each sample, and call variants between the samples.
Usage: ./pandora compare [OPTIONS] <TARGET> <QUERY_IDX>

Positionals:
  <TARGET> FILE [required]    An indexed PRG file (in fasta format)
  <QUERY_IDX> FILE [required] A tab-delimited file where each line is a sample identifier followed by the path to the fast{a,q} of reads for that sample

Options:
  -h,--help                   Print this help message and exit
  -v                          Verbosity of logging. Repeat for increased verbosity

Indexing:
  -w INT                      Window size for (w,k)-minimizers (must be <=k) [default: 14]
  -k INT                      K-mer size for (w,k)-minimizers [default: 15]

Input/Output:
  -o,--outdir DIR             Directory to write output files to [default: pandora]
  -t,--threads INT            Maximum number of threads to use [default: 1]
  --vcf-refs FILE             Fasta file with a reference sequence to use for each loci. The sequence MUST have a perfect match in <TARGET> and the same name
  --loci-vcf                  Save a VCF file for each found loci

Parameter Estimation:
  -e,--error-rate FLOAT       Estimated error rate for reads [default: 0.11]
  -g,--genome-size INT        Estimated length of the genome - used for coverage estimation [default: 5000000]
  --bin                       Use binomial model for kmer coverages [default: negative binomial]

Mapping:
  -m,--max-diff INT           Maximum distance (bp) between consecutive hits within a cluster [default: 250]
  -c,--min-cluster-size INT   Minimum size of a cluster of hits between a read and a loci to consider the loci present [default: 10]

Preset:
  -I,--illumina               Reads are from Illumina. Alters error rate used and adjusts for shorter reads

Filtering:
  --clean                     Add a step to clean and detangle the pangraph
  --max-covg INT              Maximum average coverage of reads to accept [default: 300]

Consensus/Variant Calling:
  --genotype                  Add extra step to carefully genotype sites.
  --kmer-avg INT              Maximum number of kmers to average over when selecting the maximum likelihood path [default: 100]

Genotyping:
  --local Needs: --genotype   (Intended for developers) Use coverage-oriented (local) genotyping instead of the default ML path-oriented (global) approach.
  -a INT                      Hard threshold for the minimum allele coverage allowed when genotyping [default: 0]
  -s INT                      The minimum required total coverage for a site when genotyping [default: 0]
  -D INT                      Minimum difference in coverage on a site required between the first and second maximum likelihood path [default: 0]
  -F INT                      Minimum allele coverage, as a fraction of the expected coverage, allowed when genotyping [default: 0]
  -E,--gt-error-rate FLOAT    When genotyping, assume that coverage on alternative alleles arises as a result of an error process with rate -E. [default: 0.01]
  -G,--gt-conf INT            Minimum genotype confidence (GT_CONF) required to make a call [default: 1]

The changes here are similar to map. However, I noticed that whilst there are a lot of options in common, some aren't exposed in compare but are exposed in map (e.g. --min-kmer-covg)

leoisl · 2020-09-01T07:19:22Z

Thanks a lot for this, looks really nice! Thanks for taking the time to improve the CLI!

I have some comments below, but I realise that I am asking some stuff that might take considerable development time, so feel free to ignore everything. Also what you are doing already improve the CLI by a lot, so I think it is already enough. What I am proposing would be towards cleaning more the CLI and eventually breaking commands into more subcommands... but I don't think working a lot on CLI is a high priority task in our TODO list. I would be happy to review and merge your changes, and consider that CLI work is done.

Regarding pandora index:

--offset INT Offset for PRG ids [default: 0]

I feel like this parameter was sth required so that we could split the PRG into several subfiles, and launch a pandora index job for each subfile, and then combine the index later. I guess this is not required anymore, as pandora index takes a couple of hours? There are still also performance improvements we can do in pandora index if we think it is not quick enough.

Regarding pandora map/compare:

-w INT Window size for (w,k)-minimizers (must be <=k) [default: 14]
-k INT K-mer size for (w,k)-minimizers [default: 15]

This is actually not required, given that the w and k used to build the index is in the index filename. I actually think it is error prone to ask for w and k in pandora map/compare. User might set a specific value for w and k when running pandora index and do not put this value for pandora map/compare. Anyway, it is an information that is better obtained by parsing the index filename instead of asking for it.
PS: I am not sure this belongs to this issue. It might be better to create an issue for this;
PS2: this is not major, so not high priority. I actually would prefer to have a metadata file along with the index, instead of parsing metadata from filenames.

Regarding subcommands breakdown:

I think pandora should produce unfiltered VCFs. Then we should have a pandora view or pandora filter command to filter VCFs accordingly. Then some parameters might shift into this command... But I am not sure how many parameters we can move, as many of them affects the ML path, so it might not be possible...

There are some more subcommands breakdown, but I don't think will be productive to discuss them now...

mbhall88 · 2020-09-01T07:37:27Z

Walk

Old

$ pandora walk -h
Usage: pandora walk <in_prg.fa> [<seq.fa> | --top | --bottom]

New

$ pandora walk -h
Outputs a path through the nodes in a PRG corresponding to the either an input sequence (if it exists) or the top/bottom path
Usage: ./pandora walk [OPTIONS] <PRG>

Positionals:
  <PRG> FILE [required]       A PRG file (in fasta format)

Options:
  -h,--help                   Print this help message and exit
  -i,--input FILE Excludes: --top --bottom
                              Fast{a,q} of sequences to output paths through the PRG for
  -T,--top Excludes: --input --bottom
                              Output the top path through each local PRG
  -B,--bottom Excludes: --input --top
                              Output the bottom path through each local PRG

iqbal-lab · 2020-09-01T10:18:38Z

re this

This is actually not required, given that the w and k used to build the index is in the index filename. I actually think it is error prone to ask for w and k in pandora map/compare. User might set a specific value for w and k when running pandora index and do not put this value for pandora map/compare. Anyway, it is an information that is better obtained by parsing the index filename instead of asking for it.

Depending on the filename is better , but also error prone as people fiddle with filenames - inside a header would be safest. happy to postpone this though

mbhall88 · 2020-09-01T20:25:14Z

Re: removing --offset from index. I will do this as it is a simple change.

Re: removing k and w I am a little torn about this. I don't like the idea of getting it from the filename, as the user has the option to name the index something else with --outfile. As Zam mentions, we could somehow encode it in the index header? Although, as an example, minimap2 leaves it to the user to ensure they use the same present, -x, for both indexing and mapping. So it wouldn't be unusual to maintain the current functionality. If we do want to alter this behaviour, then it should be a separate issue. This PR is just about switching the CLI implementation. There are other issues closely linked that we can then address afterwards, such as this, and consistent use of filesystem paths [#68]

Speaking of the index --outfile parameter, I think this option is actually useless? Regardless of what it is named I think pandora always looks for an expected name? There are two load functions, but this is always called.

leoisl · 2020-09-02T05:00:12Z

Speaking of the index --outfile parameter, I think this option is actually useless? Regardless of what it is named I think pandora always looks for an expected name? There are two load functions, but this is always called.

Yeah, I think --outfile is not very useful. I think https://github.com/mbhall88/pandora/blob/e497912ab645f58daf1da84cfdab6bed2983b591/include/index.h#L37 is a helper function, and should be made private, whereas https://github.com/mbhall88/pandora/blob/e497912ab645f58daf1da84cfdab6bed2983b591/include/index.h#L35 is in the public interface.

There is a refactoring of the code that I have been postponing since the beginning, which is applying encapsulation to the code (i.e. hiding all attributes, providing access to them through getters/setters, and hide helper methods, just exposing the minimal public interface to classes clients). It will make things a lot easier to change and maintain later (i.e. many parts of the code access attributes directly, and thus when changing, for e.g. a std::map to an IntervalTree data structure as we did at some point to speed things up, we actually had to change a lot of the code because of this direct access. Encapsulation would make this a lot easier).
I am still unsure about doing this refactoring. I think refactoring code is great and should be a constant effort, but there absolutely no research progress, it is basically code maintenance, and refactoring can be quite lengthy and require quite a big effort. This might be better discussed in a meeting after the paper on how much time to devote for code maintenance, and adding features to pandora.

mbhall88 · 2020-09-02T05:15:10Z

There is a refactoring of the code that I have been postponing since the beginning, which is applying encapsulation to the code (i.e. hiding all attributes, providing access to them through getters/setters, and hide helper methods, just exposing the minimal public interface to classes clients). It will make things a lot easier to change and maintain later (i.e. many parts of the code access attributes directly, and thus when changing, for e.g. a std::map to an IntervalTree data structure as we did at some point to speed things up, we actually had to change a lot of the code because of this direct access. Encapsulation would make this a lot easier).
I am still unsure about doing this refactoring. I think refactoring code is great and should be a constant effort, but there absolutely no research progress, it is basically code maintenance, and refactoring can be quite lengthy and require quite a big effort. This might be better discussed in a meeting after the paper on how much time to devote for code maintenance, and adding features to pandora.

https://www.artima.com/intv/fixit.html

mbhall88 · 2020-09-02T06:27:38Z

Group names are added to #224 (comment). Let me know if anything needs changing

mbhall88 · 2020-09-05T05:38:44Z

check_kmergraph seq2path

Old

Usage: pandora check_kmergraph <prg.fa> <seq.fa> <k> <w> <flag>

New

$ pandora seq2path -h
For each sequence, return the path through the PRG
Usage: pandora seq2path [OPTIONS] <PRG>

Positionals:
  <PRG> FILE [required]       PRG to index (in fasta format)

Options:
  -h,--help                   Print this help message and exit
  -i,--input FILE Excludes: --top --bottom
                              Fast{a,q} of sequences to output paths through the PRG for
  -w INT                      Window size for (w,k)-minimizers (must be <=k) [default: 14]
  -k INT                      K-mer size for (w,k)-minimizers [default: 15]
  -T,--top Excludes: --input --bottom
                              Output the top path through each local PRG
  -B,--bottom Excludes: --input --top
                              Output the bottom path through each local PRG
  --flag Needs: --input       output success/fail rather than the node path
  -v                          Verbosity of logging. Repeat for increased verbosity

mbhall88 · 2020-09-05T06:07:11Z

get_vcf_ref

Old

$ pandora get_vcf_ref -h
[2020-09-05 07:05:30.279251] [0x00007f0a41346f80] [debug]   Loading PRGs from file -h
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
SIGABRT: abort

followed by a whole bunch of singularity crazy syscall error messages 😬

New

$ pandora get_vcf_ref -h
Outputs a fasta suitable for use as the VCF reference using input sequences
Usage: ./pandora get_vcf_ref [OPTIONS] <PRG> [<QUERY>]

Positionals:
  <PRG> FILE [required]       PRG to index (in fasta format)
  <QUERY> FILE                Fast{a,q} file of sequences for some reason???

Options:
  -h,--help                   Print this help message and exit
  -z,--compress               Compress the output with gzip
  -v                          Verbosity of logging. Repeat for increased verbosity

mbhall88 · 2020-09-05T06:39:26Z

random_path random

I would also like to suggest renaming this subcommand to just be random

Old

$ pandora random_path -h
[2020-09-05 07:33:11.837477] [0x00007f34f049ff80] [debug]   Loading PRGs from file -h
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
SIGABRT: abort
PC=0x47ce4b m=0 sigcode=0

followed by a whole bunch of error log

New

$ pandora random -h
Outputs a fasta of random paths through the PRGs
Usage: ./pandora random [OPTIONS] <PRG>

Positionals:
  <PRG> FILE [required]       PRG to generate random paths from

Options:
  -h,--help                   Print this help message and exit
  -n INT                      Number of paths to output [default: 1]
  -z,--compress               Compress the output with gzip
  -v                          Verbosity of logging. Repeat for increased verbosity

iqbal-lab · 2020-09-05T14:28:29Z

Might it be clearer to rename "check_kmergraph" to - "seq_to_path", to convert a sequence to a path in the graph?

mbhall88 · 2020-09-06T19:55:01Z

Might it be clearer to rename "check_kmergraph" to - "seq_to_path", to convert a sequence to a path in the graph?

Sure.

mbhall88 · 2020-09-06T19:57:26Z

merge_index

Old

$ pandora merge_index -h
Usage: pandora merge_index <index1> <index2> ...
Options:
        --outfile                               Filename for merged index

New

$ pandora merge_index -h
Allows multiple indicies to be merged (no compatibility check)
Usage: ./pandora merge_index [OPTIONS] <IDX>...

Positionals:
  <IDX> FILES [required]      Indicies to merge

Options:
  -h,--help                   Print this help message and exit
  -o,--outfile FILE           Filename for merged index [default: "merged_index.idx"]
  -v                          Verbosity of logging. Repeat for increased verbosity

mbhall88 · 2020-09-06T20:11:10Z

Ok. That is all the architecture setup for the new CLI. All subcommands are now implemented. Reviewing this is a big job so feel free to do it in chunks. I think the best way is to leave comments in the code review rather than in the conversation here.
I have left a whole bunch of // todo: comments scattered throughout in parts where we need a discussion.

mbhall88 · 2020-09-21T05:22:46Z

I have updated some descriptions and shortened some options.
I am about to start working on a PR to merge candidate regions [#228] so I would like to merge this CLI PR as soon as possible to avoid problems with the other PR comes in (i.e. conflicts with this CLI PR as it will be based off this work)

leoisl · 2020-09-21T09:58:26Z

I can try to review it tomorrow, but I am overloaded... might just be able to do it on Wednesday. I think it is better to merge, and then I review at some point where I don't have to rush the review. If there is any issue, I will then make a second CLI PR, but just from an overview of what you did, I think it should be all fine. I think my reviewing will be limited also to just look at the code, but we might not be able to capture possible bugs (if there are any), as most of the code you changed (files implementing main()) have no test coverage

leoisl · 2020-09-21T15:31:14Z

Update: I will review this PR this evening. I thought about speeding up the pandora paper pipelines and put all the 3 remaining pipelines to run simultaneously, but one thing that I learned is that pushing the cluster might actually slow things down (LSF becomes overloaded, bjobs starts not keeping logs of jobs for very long (disk access to check status slow things a lot), etc). Better to run things in a slower, yet more reliable way... so will check this tonight!

leoisl · 2020-09-21T21:55:45Z

Hello again, sorry, will just be able review tomorrow :(

…al-lab-org#229]

leoisl

All fine, just some very minor stuff! You can choose if implement them or not

src/get_vcf_ref_main.cpp

src/compare_main.cpp

src/merge_index_main.cpp

src/walk_main.cpp

leoisl · 2020-09-23T11:59:46Z

As a last task for this PR to be merged, should we try to run it in a real pipeline, e.g. in your TB pipeline? I think things are fine, but as there are no code coverage on most of these code changes, I'd like to ensure that at least our normal workflow is running correctly. This task could also be pushed to later, as these changes will eventually be tested in a normal workflow...

mbhall88 · 2020-09-23T23:44:10Z

As a last task for this PR to be merged, should we try to run it in a real pipeline, e.g. in your TB pipeline? I think things are fine, but as there are no code coverage on most of these code changes, I'd like to ensure that at least our normal workflow is running correctly. This task could also be pushed to later, as these changes will eventually be tested in a normal workflow...

Yeah, I've already been doing this.

mbhall88 added 4 commits August 29, 2020 12:24

add index CLI subcommand

c7960c3

finish formatting index help menu

8c82b2c

WIP: create CLI for pandora map

4460eae

finish CLI for map

9840a97

mbhall88 requested review from iqbal-lab, leoisl and rmcolq September 1, 2020 02:19

move verbosity to the end

c4bcdee

mbhall88 added 2 commits September 1, 2020 13:58

shorten map description

cc4eeff

add compare subcommand

c068da1

add walk subcommand

e497912

add groups for map

a282c1c

mbhall88 added 4 commits September 3, 2020 06:17

add groups to compare

b9fddc4

remove offset opt from index

ab0d641

add custom formatter

f6679fe

add check_kmergraph subcommand

1a80965

add get_vcf_ref subcommand

a8d0425

add random_path subcommand

bc18713

mbhall88 added 2 commits September 7, 2020 05:57

add merge_index sub command

baafa5f

rename check_kmergraph to seq2path

866cad7

mbhall88 marked this pull request as ready for review September 6, 2020 20:08

mbhall88 added 7 commits September 10, 2020 16:12

fix incorrect VCF header numbers [iqbal-lab-org#226]

eee83f0

remove unused min_kmer_covg option

2e7cbbe

add missing descriptions for map options and shorthand opts

5e24af8

comment out excessive newlines when creating CLI groups

bbd5057

add missing compare decriptions and shorten some opts

e550162

rename subcommand random_path to random

bdfe86b

remove a todo

db32059

mbhall88 mentioned this pull request Sep 21, 2020

Confusuing option in seq2path #230

Open

remove some old comments and todos [iqbal-lab-org#215]

a497a37

make global genotyping the default and cleanup genotype function [iqb…

a41530d

…al-lab-org#229]

leoisl approved these changes Sep 22, 2020

View reviewed changes

suggestions from leandro and update README

dd0017d

mbhall88 merged commit a5baaf1 into iqbal-lab-org:dev Sep 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement and improve CLI #224

Reimplement and improve CLI #224

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 1, 2020 •

edited

Loading

leoisl commented Sep 1, 2020

mbhall88 commented Sep 1, 2020 •

edited

Loading

iqbal-lab commented Sep 1, 2020

mbhall88 commented Sep 1, 2020 •

edited

Loading

leoisl commented Sep 2, 2020

mbhall88 commented Sep 2, 2020

mbhall88 commented Sep 2, 2020

mbhall88 commented Sep 5, 2020 •

edited

Loading

mbhall88 commented Sep 5, 2020

mbhall88 commented Sep 5, 2020 •

edited

Loading

iqbal-lab commented Sep 5, 2020

mbhall88 commented Sep 6, 2020

mbhall88 commented Sep 6, 2020

mbhall88 commented Sep 6, 2020

mbhall88 commented Sep 21, 2020

leoisl commented Sep 21, 2020

leoisl commented Sep 21, 2020

leoisl commented Sep 21, 2020

leoisl left a comment

leoisl commented Sep 23, 2020

mbhall88 commented Sep 23, 2020

Reimplement and improve CLI #224

Reimplement and improve CLI #224

Conversation

mbhall88 commented Sep 1, 2020 • edited Loading

mbhall88 commented Sep 1, 2020 • edited Loading

Index

Old

New

mbhall88 commented Sep 1, 2020 • edited Loading

Map

Old

New

mbhall88 commented Sep 1, 2020 • edited Loading

Main

Old

New

mbhall88 commented Sep 1, 2020 • edited Loading

Compare

Old

New

leoisl commented Sep 1, 2020

mbhall88 commented Sep 1, 2020 • edited Loading

Walk

Old

New

iqbal-lab commented Sep 1, 2020

mbhall88 commented Sep 1, 2020 • edited Loading

leoisl commented Sep 2, 2020

mbhall88 commented Sep 2, 2020

mbhall88 commented Sep 2, 2020

mbhall88 commented Sep 5, 2020 • edited Loading

check_kmergraph seq2path

Old

New

mbhall88 commented Sep 5, 2020

get_vcf_ref

Old

New

mbhall88 commented Sep 5, 2020 • edited Loading

random_path random

Old

New

iqbal-lab commented Sep 5, 2020

mbhall88 commented Sep 6, 2020

mbhall88 commented Sep 6, 2020

merge_index

Old

New

mbhall88 commented Sep 6, 2020

mbhall88 commented Sep 21, 2020

leoisl commented Sep 21, 2020

leoisl commented Sep 21, 2020

leoisl commented Sep 21, 2020

leoisl left a comment

Choose a reason for hiding this comment

leoisl commented Sep 23, 2020

mbhall88 commented Sep 23, 2020

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 1, 2020 •

edited

Loading

mbhall88 commented Sep 5, 2020 •

edited

Loading

mbhall88 commented Sep 5, 2020 •

edited

Loading