Skip to content

An ad-hoc hierarchical clustering algorithm able to extract and rank pockets. Extraction is based on geometrical primitives generated by the NanoShaper molecular surface software. The ranking is based on Isolation Forest anomaly detector.

License

Notifications You must be signed in to change notification settings

lucagl/pickPocket

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pickPocket

An ad-hoc hierarchical clustering algorithm able to extract and rank pockets. Extraction is based on geometrical primitives generated by the NanoShaper molecular surface software. The ranking is based on Isolation Forest anomaly detector.

Details:

Clustering:

This script performs a hierarchical sngle-linkage clustering of "(regular) spherical probes" extracted from several calls to the NanoShaper software. NanoShaper is called externally by the script. The clustering process is detailed in paper in preparation

Free parameters

DEFAULT: alpha =0, beta=0.6, rp_max=3 (Angstroms)

alpha: How "easy" is to cluster among them probes of the same radius (larger--> wider clusters laterally. Better for large shallow sites)

beta: How "easy" is to cluster among them probes of different radius (larger--> deeper ramified custer)

rp_max: Large probe radius. The minimum is 1.4 (water molecule) and the series of clustered sphere is [1.4,..,rp_max] by increments of 0.1.

Ranking:

Ranking is based on Isolation Forest (IF) anomaly detector. IF is provided as a scikit-learn object previously trained and loaded from a provided binary file (in pickPocket/trainedModels)

Requirements:

  • install patchelf:
  • sudo apt get install patchelf (ubuntu)
  • or see https://gist.github.com/ruario/80fefd174b3395d34c14
  • The NanoShaper executable is provided bust must be linked to the libraries. To do so run the install_script within install binaries folder and follow the prompted instructions (type ./install_script).
  • (Reccomended) Recompile locally the shared library. This is done by running the install_script and following the instructions (gcc required).

To run the install script just move into install binaries folder and: ./install_script (it might be necessary to change permissions: chmod +x install_script)

Download trained model:

Using git lfs (recomended)

  1. git clone the folder
  2. install git lfs
  3. run: git lfs pull

Without using git lfs: download from: https://istitutoitalianotecnologia-my.sharepoint.com/:f:/g/personal/luca_gagliardi_iit_it/ErrEE6yVBGpIt_f1z43nKxkB9HZap-EtaeIFUrGzXfHRew?e=SJwohi and copy content in pickPocket/trainedModels/

contact me if the link expired (the above prevous should always work, instead)

Python modules:

  • numpy
  • scikit-learn

Installation

First check Requirements

might need to install setuptools: pip3 install -U pip setuptools

run within the folder pip3 install .

CAREFUL: In a virtual environment you might force pip to install the package in the same directory (default behavior is to copy to another location) to not miss correct pointing to libraries. If the option -e is given (develop mode) it should prevent this problem to happen.

Then the library should be available for import (see advanced use) or use it as an executable (recomended)

Instructions:

Simple

python3 -m pickPocket <file.pqr>

OUTPUTS:

Note: the numbering reflects the ranking.

  • logfile --> contains recap of info (same printed on stdout)
  • output_<pqr_name>.txt --> summary of ranked pockets (scores, subpockets etc..)
  • errorLog.txt --> errors and warnings
  • Folder 6gj6_Pfiles: contains:
  • clusterPocket<pocket_number>.pqr --> dummy "atoms" to represent the probe spheres. Compatible with VMD.
  • p<pocket_number>.off --> the triangulation of the above. Compatible with VMD.
  • p<pocket_number>_atm.pqr --> the protein surface atoms belonging to the pocket envelope (Recomended for practical use). Compatible with VMD.
  • Similarly for sub when subpockets are available.
  • infoPocket<pocket_number>.txt --> info on residues and (pseudo) mouths with relative normals. Note: you might want to post-process it with functions.getEntrance()
  • <structure_name>.vert and .face for nice triangulation in VMD of the structure. This is a "classical" NanoShaper output.

Advanced

Extra set up files: config.txt and input.prm files: Samples are given in the script folder. An example of advanced scripting is provided by scripts/loop.py together with a sample structure folder containing structure-ligand pairs and a ligandMap.txt file.

In input.prm:

Action = analysis: Stores hitting statistics and features (in a binary file) over several structure-ligands pairs of all generalted pockets according to the provided clustering parameters (can loop over the parameters as well).

Action = test: Evaluate ranking power looping over structures and ligands

Config.txt is only used to overwrite default alpha, beta and maximum probe radius clustering parameters. Will be dropped in future implementations.

About

An ad-hoc hierarchical clustering algorithm able to extract and rank pockets. Extraction is based on geometrical primitives generated by the NanoShaper molecular surface software. The ranking is based on Isolation Forest anomaly detector.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published