-
Notifications
You must be signed in to change notification settings - Fork 0
A pipeline for AlphaFold2-Multimer on a SLURM compute cluster with visualization in ChimeraX.
License
evalkov/alphafold
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
--------------------------------------------------------- AlphaFold2-Multimer pipeline for a SLURM compute cluster. --------------------------------------------------------- This is a script to submit AlphaFold2-Multimer jobs with minimal user input in a SLURM compute environment. The script expects available SLURM modules as follows: alphafold/2.3.2_conda cuda/11.8 cudnn/8.8.3-cuda11 ChimeraX (For ChimeraX, version 1.5 and above are required.) Install the script with Git: git clone https://github.com/evalkov/alphafold.git Once installed, add executable permissions with chmod +x alphafold/alphafold.sh Then, ./alphafold/alphafold.sh [options] Options: -d DIRECTORY Specify the directory path. (Required) -f FILE Specify the sequence file path. (Required) -m MODE Specify 'quick' or 'thorough'. ([1] or 5 predictions/model). -h Show help information. You could add a soft link to the script in your home folder, for example, by ln -s alphafold/alphafold.sh alphafold The script will assess if the sequence file provided is in FASTA format and contains only protein sequences. The sequence file is provided with -f flag, for example, -f fastafile.fa The script will also determine if the directory is writeable, and the output will be in a subdirectory with the user's login name. The -m flag for a quick or thorough mode of prediction: - 'quick' generates 1 prediction per model (5 total) - 'thorough' generates 5 predictions per model (25 total) There will be a basic determination of required compute resources and time limits based on the length of the protein sequences: <249 residues 6 hr 250-999 residues 24 hr 1000-1299 residues 36 hr 1300-2499 residues 48 hr >2500 residues 72 hr For predictions >2500 residues, only the 'quick' mode of 1 prediction/model is permitted, as these are typically very long jobs. Once structure predictions are available, the script will run ChimeraX and generate a short movie of the top prediction. ImageMagick's 'convert' command will convert this to a GIF. A ChimeraX script with the extension _chimera_align.cxc will align all predictions on the top solution with the largest chain as the reference. A predicted aligned error (PAE) plot will also be generated. Another ChimeraX script with the extension _chimera_pae.cxc will generate pseudobonds between residues of each pair of chains in the top prediction. The pseudobonds will be colored on a spectrum of PAE values (blue-red is 0-30). Finally, the script will zip all the predictions, the JSON file with PAE scores for the top one, and the two ChimeraX scripts above and email them to the user. The animated GIF with the rotating top prediction will also be inserted into the body of the email. Afterward, if the script picks up Box.com credentials in the .netrc, the script will directly upload the entire structure prediction to Box for long-term storage. Note that these directories can be significant in size, especially for large predictions. Any questions or problems with the script can be reported to the author. Email can be found by searching for the author's name and affiliation below. The script is distributed under GNU General Public License v3.0. Eugene Valkov, D.Phil. Center for Cancer Research National Cancer Institute National Institutes of Health Frederick, Maryland, U.S.A.
About
A pipeline for AlphaFold2-Multimer on a SLURM compute cluster with visualization in ChimeraX.
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published