tracknodes

Tracknodes keeps a history of node state and comment changes. It allows system administrators of HPC systems to determine when nodes were down and discover trends such as recurring issues. Supports Torque and PBSpro and has limited support for SLURM.


Keywords
cluster, hpc, hpc-systems, nhc, openhpc, pbspro, slurm, torque
License
GPL-2.0+
Install
pip install tracknodes==1.0.1

Documentation

tracknodes

Description

Tracknodes keeps a history of node state and comment changes. It allows system administrators of HPC systems to determine when nodes were down and discover trends such as recurring issues. Supports Torque, PBSpro and has limited support for SLURM.

Build Status PIP Version PIP Downloads Coverage Status Gitter IM

Installation

$ pip install tracknodes

or

$ easy_install tracknodes

Usage

Setup a cronjob on an admin node. This step is required for node state changes to be tracked.

$ crontab -u root -e
# Track Node State Every Minute
* * * * * (/usr/bin/tracknodes --update >/dev/null 2>&1)

Use the below command to see the history of node changes.

$ tracknodes
History of Nodes
=========
n101 | 2016-11-28 21:30:01 | online | ''
n101 | 2016-11-28 20:30:01 | offline,down | 'Hardware issue bad DIMM'
n092 | 2016-11-27 19:30:01 | online | ''
n092 | 2016-11-27 12:00:01 | offline | 'Hardware issue failed disk'
n021 | 2016-11-27 09:00:01 | online | ''
n021 | 2016-11-26 19:00:01 | offline,down | 'DIMM Configuration Error'
-- --

You can setup the configuration file for tracknodes to change the database location or the command to get node status. Use the below as an example.

$ cat /etc/tracknodes.conf
---
dbfile: "/opt/tracknodes.db"
cmd: "/opt/pbsnodes"

Tracknodes uses a sqlite database to store the node history, you can determine what database its using with the -v argument.

$ tracknodes -v
Resource Manager Detected as torque
cmd: /opt/pbsnodes
dbfile: ~/.tracknodes.db
...

For usage information you can use --help.

$ tracknodes --help
Usage: tracknodes [options]

Options:
  -h, --help            show this help message and exit
  -U, --update          Update Database From Current Node States
  -f DBFILE, --dbfile=DBFILE
                        Database File
  -c CMD, --cmd=CMD
                        Location of command to show node state, example: /opt/pbsnodes, /opt/sinfo
  -v, --verbose         Verbose Output

License

tracknodes is released under the GPLv3 License.