A stand alone Python application for killing errant processes on Slurm based compute nodes.


Keywords
process, terminate, slurm, pyhon
License
GPL-3.0
Install
pip install crc-shinigami==0.6.0

Documentation

Shinigami

Shinigami is a stand alone Python application for killing errant processes on Slurm based compute nodes. The application scans for and terminates any running processes not associated with a currently running Slurm job. Processes associated with whitelisted users (root, administrators, service accounts, etc.) are ignored.

Installation and Setup

The shinigami command line utility is installable via the pip (or pipx) package manager:

pipx install shinigami

To be of maximal use, it is recommended to run the utility every half hour. However, you may find a different cadence more appropriate depending on your cluster size and use case. Running the utility automatically is accomplished via a simple cron job:

0,30 * * * * shinigami

You may wish to configure the cron job to run under a dedicated service account. When doing so, ensure the user is added to the admin list and satisfies the following criteria:

  • Exists on all compute nodes
  • Has appropriate permissions to terminate system processes on compute nodes
  • Has established SSH keys for connecting to compute nodes