Skip to content

Semi-curated metadata for marine metagenomes (and other 'omics data types)

Notifications You must be signed in to change notification settings

merenlab/public-marine-omics-metadata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Purpose

Welcome to the Standardized Metadata Collection for Omics Data repository! This repository is dedicated to the collection, standardization, and sharing of metadata associated with various omics data types, currently focusing on metagenomics but with plans to expand to metatranscriptomics, metaproteomics, and metabolomics.

Our goal is to present metadata in a standardized fashion, both semantically and syntactically, ensuring it is ready for analysis. We will provide guidance and examples to help others contribute. By pooling our efforts, we aim to avoid the duplication of work involved in preparing individual metadata sets and make the process more efficient, considering the significant amount of effort required for metadata preparation.

We are starting by providing the workflow we followed (see scripts/README.md) to get the metadata for the projects listed below. Of course sharing the final product: metagenomes.txt.

Metagenomes

The purpose of this section is to list the metagenomes included in the current metadata curation effort. This includes the project name, (data) publication, date range, depth range, number of samples, number of runs, project accession number, and other relevant information about the projects and their metadata below.

For details of the metadata curation efforts that include the following datasets and publications that originally describe them see scripts/README.md

Note

Please note, that the current collection of curated metadata is limited to runs from the projects noted below that

  • are metagenomes
  • are paired-end
  • are from samples collected in ≤100 m depth
  • are from samples associated with at least environmental metadata on temperature

Quick overview of projects included in curated metadata:

Observatory/Cruise Project acronym Accession number Metadata citation (Data) publication
Bermuda Atlantic Time-series Study BATS PRJNA385855 European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJNA385855 [Data set]. Retrieved from https://www.ebi.ac.uk/ena | Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176 | Bermuda Atlantic Time-series Study (BATS). (2024). BATS Oceanographic and Biogeochemical Data [bats_bottle.txt]. Retrieved from https://bats.bios.asu.edu/data/ Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176
bioGEOTRACES BGT PRJNA385854 European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJNA385854 [Data set]. Retrieved from https://www.ebi.ac.uk/ena | Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176 | GEOTRACES Intermediate Data Product Group (2023). The GEOTRACES Intermediate Data Product 2021v2 (IDP2021v2). NERC EDS British Oceanographic Data Centre NOC. doi:10.5285/ff46f034-f47c-05f9-e053-6c86abc0dc7e Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176
Bio-GO-SHIP BGS PRJNA656268 European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJNA656268 [Data set]. Retrieved from https://www.ebi.ac.uk/ena | Larkin, A.A., Garcia, C.A., Garcia, N. et al. High spatial resolution global ocean metagenomes from Bio-GO-SHIP repeat hydrography transects. Sci Data 8, 107 (2021). https://doi.org/10.1038/s41597-021-00889-9 Larkin, A.A., Garcia, C.A., Garcia, N. et al. High spatial resolution global ocean metagenomes from Bio-GO-SHIP repeat hydrography transects. Sci Data 8, 107 (2021). https://doi.org/10.1038/s41597-021-00889-9
Hawaii Ocean Time-Series ALOHA (2003-2004; 2009) HOT1 PRJNA385855 European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJNA385855 [Data set]. Retrieved from https://www.ebi.ac.uk/ena | Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176 | Data obtained via the Hawaii Ocean Time-series HOT-DOGS application; University of Hawai'i at Mānoa. National Science Foundation Award # 1756517 | Hawaii Ocean Time-series (HOT). (2024). HOT-DOGS: Data Organization & Graphical System for the Hawaii Ocean Time-series [Bottle_Extraction]. Retrieved from https://hahana.soest.hawaii.edu/hot/hot-dogs/index.html Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176
Hawaii Ocean Time-Series ALOHA (2010-2016) HOT3 PRJNA352737 European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJNA352737 [Data set]. Retrieved from https://www.ebi.ac.uk/ena | Mende, D.R., Bryant, J.A., Aylward, F.O. et al. Environmental drivers of a microbial genomic transition zone in the ocean’s interior. Nat Microbiol 2, 1367–1373 (2017). https://doi.org/10.1038/s41564-017-0008-3
Malaspina Expedition MAL PRJEB52452 European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJEB52452 [Data set]. Retrieved from https://www.ebi.ac.uk/ena| Sánchez, P., Coutinho, F.H., Sebastián, M. et al. Marine picoplankton metagenomes and MAGs from eleven vertical profiles obtained by the Malaspina Expedition. Sci Data 11, 154 (2024). https://doi.org/10.1038/s41597-024-02974-1 Sánchez, P., Coutinho, F.H., Sebastián, M. et al. Marine picoplankton metagenomes and MAGs from eleven vertical profiles obtained by the Malaspina Expedition. Sci Data 11, 154 (2024). https://doi.org/10.1038/s41597-024-02974-1
Ocean Sampling Day 2014 OSD PRJEB8682 European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJEB8682 [Data set]. Retrieved from https://www.ebi.ac.uk/ena| Ocean Sampling Day Consortium, Participants (2015): Registry of samples and environmental context from the Ocean Sampling Day 2014 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.854419
Tara Oceans Project TARA PRJEB1787 European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJEB1787 [Data set]. Retrieved from https://www.ebi.ac.uk/ena

Number of samples per project at each step of this metadata curation:

Observatory/Cruise #s in metadata #s metagenomes and paired end only #s depth filtering ≤100m #s after metadata filtering
Bermuda Atlantic Time-series Study 62 62 40 40
bioGEOTRACES 480 480 323 323
Bio-GO-SHIP 996 971 969 969
Hawaii Ocean Time-Series ALOHA (2003-2004; 2009) 68 68 33 28
Hawaii Ocean Time-Series ALOHA (2007-2009) 54 0 - -
Hawaii Ocean Time-Series ALOHA (2010-2016) 773 597 230 230
Malaspina 81 81 16 16
Ocean Sampling Day 2014 162 150 150 127
Tara Oceans Project 136 136 95 92
Western Channel Observatory 10 0 - -
All 2822 2545 1856 1825

Metatranscriptomes

To be added as the collection of metadata grows.

Metaproteomes

To be added as the collection of metadata grows.

Metabolomes

To be added as the collection of metadata grows.

About

Semi-curated metadata for marine metagenomes (and other 'omics data types)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published