Feature group coverage

The coverage command calculates the coverage -- percentage of features present in each sample over a pre-defined group of features -- of a profile.

woltka coverage -i input.biom -m mapping.txt -o output.biom

A typical use case is to assess the likelihoods of presence of metabolic pathways in each organism or community. Because a pathway consists of multiple chemical reactions or functional genes connected to each other, the presence of some of them (even with high abundance) in the sample does not necessarily suggest that the entire pathway is viable. Only when all or a large proportion of them are found can we be more confident about this hypothesis.

In this example, the input profile (sample) is a table of genes:

Feature ID	Sample 1	Sample 2	Sample 3	Sample 4
plsC	51	49	113	34
fruK	83	128	160	41
panE	0	53	0	39
leuA	111	262	232	77
...

The mapping file (sample) defines the member features (genes) of each feature group (pathway) (each line can have arbitrary number of fields; field delimiter is <tab>):


Asparagine biosynthesis	asnB	aspC
Biotin synthesis	bioA	bioB	bioD	bioF
NAD biosynthesis II	hel	nudC	nadN	pnuE	nadR	nadM
pyruvate decarboxylation	aceE	aceF	lpd
...

The output file (sample) is a table of coverage values (percentages) per sample per feature group (pathway):

Feature ID	Sample 1	Sample 2	Sample 3	Sample 4
Biotin synthesis	50.0	50.0	25.0	37.5
GDP-D-rhamnose biosynthesis	20.0	80.0	20.0	80.0
L-glutamine degradation I	100.0	100.0	50.0	0.0
Sucrose biosynthesis I	20.0	20.0	20.0	20.0
...

Note: The "coverage" computed by Woltka is not the same as those by HUMAnN2 (whether the pathway is present) and HUMAnN3 (how likely the pathway is present), although the usage and interpretation may be comparable.

Parameters

Presence / absence

With parameter --threshold or -t followed by a percentage (e.g., 80), the output coverage table will display binary results, with "1" representing coverage above or equal to this threshold and "0" being coverage below this threshold.

Feature count

With flag --count or -c, the program will report the number of member features of a group present in a sample, instead of the percentage. Note: This will override --threshold.

Feature group names

One can supply a mapping of feature groups to their names by --names or -n, and these names will be appended to the coverage table as a metadata column ("Name").

Considerations

The coverage command will treat any feature count -- as low as 1 -- as the evidence of the feature's presence. False positives may be introduced if the profile has many noises. One may consider filtering the profile prior to running this command. Woltka provides a per-sample feature abundance filtering function, in addition to the multiple filtering functions implemented in the QIIME 2 plugin feature-table.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coverage.md

coverage.md

Feature group coverage

Parameters

Presence / absence

Feature count

Feature group names

Considerations

Files

coverage.md

Latest commit

History

coverage.md

File metadata and controls

Feature group coverage

Parameters

Presence / absence

Feature count

Feature group names

Considerations