GitHub

This is the official implementation for the following paper:

Protein Multimer Structure Prediction via Prompt Learning, ICLR 2024.

Data Processing

First, download the latest Biological assembly PDB files from the RCSB PDB database. We recommend running the command:

rsync -rlpt -v -z --delete --port=33444 rsync.rcsb.org::ftp_data/biounit/PDB/all/ ./pdb_all

cd cdhit_process
Run python pdb2fasta.py. You can download the FASTA file for all PDBs from here and place it in .cdhit_process/fasta_all.txt.
Run cd-hit -i fasta_all.txt -o cluster_result -c 0.40 -n 2. The clustering result is stored in cdhit_process/cluster_result.clstr.
Run selec_from_clustering.py. We have the rough version of the PDB-M dataset ./PDB-M/PDB-M-rough.txt.
Run python pre_process/process_source_data.py -n_min 3 -n_max 30 -homo_ratio 0.99 -data_fraction 1.0

After this process, we obtain the PDB-M dataset ./source_data/all_chain_pdb.txt, which is also the ./PDB-M/PDB-M.txt. We randomly generate the training set ./PDB-M/PDB-M-train.txt and the test set ./PDB-M/PDB-M-test.txt.

Source Data

Creating the pre-training source data with multimers of $N=3, 4, 5$.

python ./source_data/process_source_data.py -n_min 3 -n_max 5 -homo_ratio 0.5 -data_fraction 1.0

Creating the dgl format data and labels for training.

python produce_training_dgls.py

After processing the source data, train_oracle_dgl_train_3_5.pt and rmsd_loss_train_3_5.pt will appear in ./source_data.

Target Data

Creating the prompting target data with multimers of $N=3\sim30$ (training set).

python produce_prompting_dgls.py

After processing the target data, train_prompt_dgls.bin and train_prompt_rmsd.pt will appear in ./target_data.

This process is time consuming. If you want to omit it, you can download the files train_prompt_dgls.bin, train_prompt_rmsd.pt and new_node_emb.pt directly from here and put them in the ./target_data.

Preparing Dimers of GT and ESMFold

For getting GT dimers, we can handle the pair of chains without physical contact with EquiDock.

./dimer/inference_rigid_half_euidock.py.py

However, this is a bit time consuming (because we need all possible pairs of chains within each multimer). As an alternative, we quickly generate dimers for pairs without physical contact, as long as the dimers of the two chains come into contact with each other.

./dimer/inference_rigid_no_euidock_fast.py

We need to prepare the ESMFold-produced dimers for test set multimers.

./dimer/inference_rigid_esmfold.py.py

Pre-training

Training the GIN model

python run_pre_training.py -h_feats 512 -cls_h 256 -num_layers 1 -gnn_type 'gin' -lr 1e-3 -bs 50 -epochs 300

Prompting

python run_prompting.py -h_feats 512 -lr 1e-3 -bs 3000 -epochs 50

Inference (test)

First, you can download our checkpoints from here and put them in ./checkpoints.

Using ground-truth dimers:

python inference.py -dimer_type gt

Using ESMFold dimers:

python inference.py -dimer_type esmfold

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
PDB-M		PDB-M
__pycache__		__pycache__
cdhit_process		cdhit_process
checkpoints		checkpoints
dimer		dimer
prompting_logs		prompting_logs
source_data		source_data
training_logs		training_logs
LICENSE		LICENSE
README.md		README.md
gnn_models.py		gnn_models.py
inference.py		inference.py
produce_prompting_dgls.py		produce_prompting_dgls.py
produce_training_dgls.py		produce_training_dgls.py
result.pdb		result.pdb
run_pre_training.py		run_pre_training.py
run_prompting.py		run_prompting.py
utils.py		utils.py
utils_target.py		utils_target.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Processing

Source Data

Target Data

Preparing Dimers of GT and ESMFold

Pre-training

Prompting

Inference (test)

About

Releases

Packages

Languages

License

zqgao22/PromptMSP

Folders and files

Latest commit

History

Repository files navigation

Data Processing

Source Data

Target Data

Preparing Dimers of GT and ESMFold

Pre-training

Prompting

Inference (test)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages