Skip to content

Latest commit

 

History

History
60 lines (40 loc) · 3.46 KB

coverage.md

File metadata and controls

60 lines (40 loc) · 3.46 KB

Feature group coverage

The coverage command calculates the coverage -- percentage of features present in each sample over a pre-defined group of features -- of a profile.

woltka coverage -i input.biom -m mapping.txt -o output.biom

A typical use case is to assess the likelihoods of presence of metabolic pathways in each organism or community. Because a pathway consists of multiple chemical reactions or functional genes connected to each other, the presence of some of them (even with high abundance) in the sample does not necessarily suggest that the entire pathway is viable. Only when all or a large proportion of them are found can we be more confident about this hypothesis.

In this example, the input profile (sample) is a table of genes:

Feature ID Sample 1 Sample 2 Sample 3 Sample 4
plsC 51 49 113 34
fruK 83 128 160 41
panE 0 53 0 39
leuA 111 262 232 77
...

The mapping file (sample) defines the member features (genes) of each feature group (pathway) (each line can have arbitrary number of fields; field delimiter is <tab>):

Asparagine biosynthesis asnB aspC
Biotin synthesis bioA bioB bioD bioF
NAD biosynthesis II hel nudC nadN pnuE nadR nadM
pyruvate decarboxylation aceE aceF lpd
...

The output file (sample) is a table of coverage values (percentages) per sample per feature group (pathway):

Feature ID Sample 1 Sample 2 Sample 3 Sample 4
Biotin synthesis 50.0 50.0 25.0 37.5
GDP-D-rhamnose biosynthesis 20.0 80.0 20.0 80.0
L-glutamine degradation I 100.0 100.0 50.0 0.0
Sucrose biosynthesis I 20.0 20.0 20.0 20.0
...

Note: The "coverage" computed by Woltka is not the same as those by HUMAnN2 (whether the pathway is present) and HUMAnN3 (how likely the pathway is present), although the usage and interpretation may be comparable.

Parameters

Presence / absence

With parameter --threshold or -t followed by a percentage (e.g., 80), the output coverage table will display binary results, with "1" representing coverage above or equal to this threshold and "0" being coverage below this threshold.

Feature count

With flag --count or -c, the program will report the number of member features of a group present in a sample, instead of the percentage. Note: This will override --threshold.

Feature group names

One can supply a mapping of feature groups to their names by --names or -n, and these names will be appended to the coverage table as a metadata column ("Name").

Considerations

The coverage command will treat any feature count -- as low as 1 -- as the evidence of the feature's presence. False positives may be introduced if the profile has many noises. One may consider filtering the profile prior to running this command. Woltka provides a per-sample feature abundance filtering function, in addition to the multiple filtering functions implemented in the QIIME 2 plugin feature-table.