Skip to content

Supporting data and code for: "Phosphonate production by marine microbes: exploring new sources and potential function"

License

Notifications You must be signed in to change notification settings

slhogle/phosphonates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supporting data and code for "Phosphonate production in marine microbes: exploring new sources and potential function"

Preprint available from bioRxiv

Data and code here is provided “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED under the GNU General Public License v3.0. Feel free to use or remix as you see fit.

The data directory contains results and summary tables needed to support the conclusions of the paper. It is basically formatted output from the code directory and should be clearly organized and user friendly. The code directory has many different bash and R scripts in varying degrees of "user-friendliness" and is presented here mainly in the interest of transparency and not as a software tool for production use. The directories are organized by the type of analysis performed in the paper and should be modifiable to suite your computing environment and needs if you have the time and programming experience.

Note on reproducibility

The GEOTRACES data license is not GPL3 and you must request access and be approved before being allowed to download the data. Therefore, I can not include the GEOTRACES data here since it would be in violation of the license agreement. Unfortunately, this means that the workflows in the code directory are not truly reproducible since they do not include the GEOTRACES data. You can fill in the missing data for yourself by following the code available here - https://doi.org/10.5281/zenodo.3689249. You will then simply need to replace the completed data files with the onces provided in this repository here which contain NA values for all GEOTRACES chemical measurements. However, all measurements for Prochlorococcus and SAR11 ecotype abundance are included.

Shortcuts

Phosphonate biosynthesis

Phosphonate catabolism

  • Gene families used to identify 4 discrete phosphonate catabolism pathways in genomes
  • HMM models for identifying phosphonate catabolism genes

Core genes families

Genome context around PepM clusters in Prochlorococcus SB, SAR11 RS40, and SAR11 HTCC7217

Prochlorococcus SB Phosphonoglycoprotein locus and potential S-layer proteins

Annotations from antiSMASH

Blin, K., S. Shaw, K. Steinke, R. Villebro, N. Ziemert, S. Y. Lee, M. H. Medema, and T. Weber. 2019. antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids Res. 47: W81–W87.

Functional enrichments near PepM clusters

Enrichments estimated from hypergeometric test implemented in clusterProfiler

Yu, G., L.-G. Wang, Y. Han, and Q.-Y. He. 2012. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16: 284–287.

Directory Structure

.
├── code
│   ├── gene-prevalence-genomes
│   │   ├── MARMICRODB
│   │   │   ├── bin
│   │   │   └── input-data
│   │   └── gorg-tropics
│   │       ├── bin
│   │       ├── figs
│   │       └── input-data
│   ├── gene-searches
│   │   ├── MARMICRODB
│   │   │   ├── bin
│   │   │   ├── figs
│   │   │   ├── input-data
│   │   │   │   ├── phn-synthesis
│   │   │   │   └── phn-utilization
│   │   │   └── results-tables
│   │   └── gorg-tropics
│   │       ├── bin
│   │       ├── input-data
│   │       │   ├── phn-synthesis
│   │       │   └── phn-utilization
│   │       └── results-tables
│   ├── metagenome-abundances-model
│   │   ├── bin
│   │   ├── figs
│   │   ├── input
│   │   └── output
│   ├── metagenome-data-wrangling
│   │   ├── bin
│   │   ├── input
│   │   ├── output
│   │   └── output-old
│   ├── metagenome-searches
│   │   ├── mock-metagenome
│   │   │   ├── check_tax_GORG_pepm_scaffolds
│   │   │   │   └── data
│   │   │   ├── diamond-search-mockreads
│   │   │   │   ├── data
│   │   │   │   ├── figs
│   │   │   │   └── tables
│   │   │   └── pepm_mgassembly_searches
│   │   │       └── data
│   │   │           ├── GEOTRACES
│   │   │           └── TARA-OM-RGC
│   │   ├── pepm-counts
│   │   └── pro-syn-sar-core-counts
│   ├── pepm-enrichment-analysis
│   │   ├── bin
│   │   ├── input-data
│   │   │   ├── coords
│   │   │   ├── eggnoggmapped
│   │   │   ├── go
│   │   │   └── kegg
│   │   └── results-tables
│   ├── pepm-position-islands
│   │   ├── bin
│   │   ├── figs
│   │   └── input-data
│   └── phylogenies
│       ├── gtdbtk-v-pepm-tree-comparisons
│       │   ├── data
│       │   └── results
│       └── phn-utilization-synthesis
│           ├── bin
│           ├── data
│           └── figs
└── data
    ├── core_gene_families
    │   ├── HMM_alignments
    │   └── HMM_models
    ├── pepm_genome_context
    │   ├── antismash
    │   │   ├── SAR11_HTCC7217
    │   │   ├── SAR11_RS40
    │   │   └── prochlorococcus_SB
    │   ├── clusterprofiler
    │   └── eggnogmapper
    ├── phosphonate_biosynthesis
    │   ├── GORG_tropics
    │   ├── HMM_models
    │   │   ├── MPNS
    │   │   │   ├── alns
    │   │   │   ├── hmm
    │   │   │   └── seqs
    │   │   ├── PEPM
    │   │   ├── PPDA
    │   │   └── PPDH
    │   └── MARMICRODB
    └── phosphonate_catabolism
        ├── GORG_tropics
        ├── HMM_models
        │   ├── gntR
        │   ├── hpnW
        │   ├── hpnX
        │   ├── hpnZ
        │   ├── lysR
        │   ├── phn01
        │   ├── phn02
        │   ├── phnA
        │   ├── phnC
        │   ├── phnD
        │   ├── phnD-like
        │   ├── phnE
        │   ├── phnF
        │   ├── phnG
        │   ├── phnH
        │   ├── phnI
        │   ├── phnJ
        │   ├── phnK
        │   ├── phnL
        │   ├── phnM
        │   ├── phnN
        │   ├── phnO
        │   ├── phnP
        │   ├── phnS
        │   ├── phnS2
        │   ├── phnT
        │   ├── phnT2
        │   ├── phnU
        │   ├── phnU2
        │   ├── phnV
        │   ├── phnW
        │   ├── phnX
        │   ├── phnY
        │   ├── phnZ
        │   ├── phyH
        │   └── ptxD
        └── MARMICRODB