Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document QIIME 2 metadata merging complications #393

Closed
fedarko opened this issue Sep 24, 2020 · 1 comment · Fixed by #404
Closed

Document QIIME 2 metadata merging complications #393

fedarko opened this issue Sep 24, 2020 · 1 comment · Fixed by #404

Comments

@fedarko
Copy link
Collaborator

fedarko commented Sep 24, 2020

When multiple sample* / feature metadata files are provided to Empress through QIIME 2, they're merged in such a way that only stuff shared across all metadata files is included. See here for details.

The problem with this is that this can rapidly reduce the amount of metadata passed to Empress -- the Q2 tutorial feature_importance.qza only contains 566 features, while the taxonomy.qza contains 770 features. This means that passing both in to Empress will "remove" taxonomy data for a lot of features, making taxonomy coloring look a lot more sparse.

Since it might be a while until there is built-in QIIME 2 support for other merging methods, in the interim we should ideally:

  1. Update the README to mention this problem
  2. Add an example python script or something for merging metadata files that users can easily start from

For task 2, here is a rough transcript of the code I used to merge the feature metadata files in this directory:

import pandas as pd
aldex = pd.read_csv("aldex2_results.txt", sep="\t", index_col=0)
sb = pd.read_csv("differentials.csv", sep="\t", index_col=0)
ancom = pd.read_csv("ancom_results_mixed.csv", sep="\t", index_col=0)

# Remove leading Xs added by R
aldex.index =[i if i[0] != 'X' else i[1:] for i in aldex.index]
diff = pd.concat([sb, aldex, ancom], axis=1)
# Replace NaNs with empty strings
diffe = diff.fillna("")
# Make the name of the index column "valid" for QIIME 2
diffe.index.name = "FeatureID"

diffe.to_csv("merged_diffabund.tsv", sep="\t")

Should be decent enough.

* I think this might impact sample metadata files, but feature metadata files are more of a problem for this right now

fedarko added a commit to fedarko/empress that referenced this issue Sep 30, 2020
@fedarko
Copy link
Collaborator Author

fedarko commented Sep 30, 2020

Simpler example, involving merging a taxonomy.qza and feature_importance.qza file:

from qiime2 import Artifact
import pandas as pd
fi = Artifact.load("feature_importance.qza").view(pd.DataFrame)
tax = Artifact.load("taxonomy.qza").view(pd.DataFrame)
merged_df = pd.concat([tax, fi], axis=1, sort=False)

# Assign index a name to allow us to use this as a Q2 feature metadata file
merged_df.index.name = "FeatureID"

# Missing values are, by default, represented as NaNs.
# .to_csv() represents them in the TSV as empty values by default (see the
# na_rep parameter:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)

merged_df.to_csv("merged_fm.tsv", sep="\t")

After this, the merged_fm.tsv file can be passed to Empress via --m-feature-metadata-file in place of the two initial QZAs. This will allow us to visualize both all available taxonomy data and all available feature importance data, even though the feature importances are not provided for some of the features in the dataset:

fi

fedarko added a commit to fedarko/empress that referenced this issue Sep 30, 2020
ElDeveloper pushed a commit that referenced this issue Oct 1, 2020
* ENH: add border btwn q2template header & app body

Makes things look nicer when using qiime tools view

* REL: Remove pep8 dependency: closes #397

* DOC: note in README that emperor'll be reinstalled

closes #401

* DOC: describe metadata merging pbms (close #393)

* DOC: tidy up #393 stuff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant