PPI_Prediction_byCoevolution

environment setup

To run this repository, Nextflow and Singularity need to be installed.

Nextflow installation: Official documentation
or simply done via conda install:

conda create -n py_nextflow --channel bioconda python=3.8 nextflow=23.04.1

Singularity installation: Official documentation
or simply via conda install (Currently it's only avaiable for Linux):

conda activate py_nextflow 
conda install -c conda-forge singularity=3.8.6

The simplest strategy to use all necessary software and libraries for this project is to use the singularity containers that we have built.
But if one has problems installing Singularity, all necessary software and libraries can be installed via conda (on Linux).
Check more details in the documentation "containers/conda_envs/conda_installation.md" in this repository.

Raw data and result computation

To download all the raw data, generate paired alignment data, and compute DCA results for all protein pairs.
Go to folder PPI_Prediction_byCoevolution/scripts and run one of the following:

nextflow run query2subject_homologousPPDetectionAndCompuation_workflow.nf --root_folder "/home/tao"  -c nextflow.config -profile singularity   -resume (on the local machine) \
nextflow run query2subject_homologousPPDetectionAndCompuation_workflow.nf --root_folder "/home/tao"  -c nextflow.config -profile slurm_withSingularity  -resume (On HPC with slurm)
nextflow run query2subject_homologousPPDetectionAndCompuation_workflow.nf --root_folder "/home/tao"  --conda_envs_path "/home/tao/anaconda3/envs" -c nextflow.config -profile standard   -resume (on the local machine when the singularity is not available) \

For the customized parameters, you could directly modify their values in the configuration file "scripts/nextflow.config"
or via the command line (e.g. --root_folder= "path to the location where you want to save all data")

Warning: The whole computation could take months depending on the available computational resources and all the final results take up around 16TB of disk space
This dataset is too large to share online so is only available upon request.

To run Alphafold-Multimer for the selected protein pairs, we use the generated customized paired alignment data as input to ColabFold (v1.3.0)

Paper figures

Download the singularity container for this :

singularity pull --arch amd64 library://tfang/base/py38_notebook:latest

The data needed to run the otebooks can either be from the last step that is generated from scratch (This could take months depending on the available computational resource)
Alternatively the final cached results can be downloaded from Zenodo at: https://zenodo.org/record/8429824

To run the otebooks,go to folder PPI_Prediction_byCoevolution/notebooks and start the singularity container by:

singularity shell py38_notebook.sif

Then inside container run:

jupyter notebook --no-browser --port=8036

Then the notebooks are accessible at http://localhost:8036/
Remember to set the variable "notebookData_folder" in the notebooks to the location where you save the data

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
containers		containers
notebooks		notebooks
scripts		scripts
src/utilities		src/utilities
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPI_Prediction_byCoevolution

environment setup

Raw data and result computation

Paper figures

About

Releases

Packages

Languages

TaoDFang/PPI_Prediction_byCoevolution

Folders and files

Latest commit

History

Repository files navigation

PPI_Prediction_byCoevolution

environment setup

Raw data and result computation

Paper figures

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages