To run this repository, Nextflow and Singularity need to be installed.
Nextflow installation:
Official documentation
or simply done via conda install:
conda create -n py_nextflow --channel bioconda python=3.8 nextflow=23.04.1
Singularity installation:
Official documentation
or simply via conda install (Currently it's only avaiable for Linux):
conda activate py_nextflow
conda install -c conda-forge singularity=3.8.6
The simplest strategy to use all necessary software and libraries for this project is to use the singularity containers that we have built.
But if one has problems installing Singularity, all necessary software and libraries can be installed via conda (on Linux).
Check more details in the documentation "containers/conda_envs/conda_installation.md" in this repository.
To download all the raw data, generate paired alignment data, and compute DCA results for all protein pairs.
Go to folder PPI_Prediction_byCoevolution/scripts and run one of the following:
nextflow run query2subject_homologousPPDetectionAndCompuation_workflow.nf --root_folder "/home/tao" -c nextflow.config -profile singularity -resume (on the local machine) \
nextflow run query2subject_homologousPPDetectionAndCompuation_workflow.nf --root_folder "/home/tao" -c nextflow.config -profile slurm_withSingularity -resume (On HPC with slurm)
nextflow run query2subject_homologousPPDetectionAndCompuation_workflow.nf --root_folder "/home/tao" --conda_envs_path "/home/tao/anaconda3/envs" -c nextflow.config -profile standard -resume (on the local machine when the singularity is not available) \
For the customized parameters, you could directly modify their values in the configuration file "scripts/nextflow.config"
or via the command line (e.g. --root_folder= "path to the location where you want to save all data")
Warning: The whole computation could take months depending on the available computational resources and all the final results take up around 16TB of disk space
This dataset is too large to share online so is only available upon request.
To run Alphafold-Multimer for the selected protein pairs, we use the generated customized paired alignment data as input to ColabFold (v1.3.0)
Download the singularity container for this :
singularity pull --arch amd64 library://tfang/base/py38_notebook:latest
The data needed to run the otebooks can either be from the last step that is generated from scratch (This could take months depending on the available computational resource)
Alternatively the final cached results can be downloaded from Zenodo at: https://zenodo.org/record/8429824
To run the otebooks,go to folder PPI_Prediction_byCoevolution/notebooks and start the singularity container by:
singularity shell py38_notebook.sif
Then inside container run:
jupyter notebook --no-browser --port=8036
Then the notebooks are accessible at http://localhost:8036/
Remember to set the variable "notebookData_folder" in the notebooks to the location where you save the data