Skip to content

Python application for killing orphaned processes on compute nodes

License

Notifications You must be signed in to change notification settings

pitt-crc/shinigami

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shinigami

Shinigami is a stand alone Python application for killing errant processes on Slurm based compute nodes. The application scans for and terminates any running processes not associated with a currently running Slurm job. Processes associated with whitelisted users (root, administrators, service accounts, etc.) are ignored.

Installation and Setup

The shinigami command line utility is installable via the pip (or pipx) package manager:

pipx install shinigami

To be of maximal use, it is recommended to run the utility every half hour. However, you may find a different cadence more appropriate depending on your cluster size and use case. Running the utility automatically is accomplished via a simple cron job:

0,30 * * * * shinigami

You may wish to configure the cron job to run under a dedicated service account. When doing so, ensure the user is added to the admin list and satisfies the following criteria:

  • Exists on all compute nodes
  • Has appropriate permissions to terminate system processes on compute nodes
  • Has established SSH keys for connecting to compute nodes