Skip to content

A tool for tracking TORQUE PBS job status on HPC systems.

License

Notifications You must be signed in to change notification settings

rwalle/pbstracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PBS job status tracker

This is a tool that helps you track the PBS jobs submitted to an HPC cluster. After submitting jobs with psub command, the start/exit/finish status will be automatically reported to a website. Most importantly, this tool reports the stdout and stderr output of a job and marks those that have non empty stderr files, freeing users from manually checking the .e* files after jobs finish.

The toolkit provides a Python 3 client for submitting jobs on the HPC clusters, and a web server for viewing and managing jobs. The frontend of the web server uses React. Two backends are provided, one written with Django and Django REST Framework and uses SQL database, the other written with Express and Mongoose and uses mongoDB as database.

A read-only demo site using Express deployed on Google Firebase is provided.

The current version only supports TORQUE system.

Example

$ psub analyze.sub

Screenshot 1 Screenshot 2

Test job files are placed under test_subfiles folder.

Set up

You need to have basic python and Node.js knowledge to setup client and server, and may need to implement your own authentication protocols.

Instructions are in the README.md files under the folders.

Python client setup Express backend setup Django backend setup

How it works

When you submit a job, the tool (1) attaches prologue and epilogue scripts to the job and (2) reports the jobid to the server (3) automatically submits another "checking" script that "depends" on the original job. (i.e. only executes after the original script finishes) When the prologue script runs, it sends a "started" signal to the server. For epilogue script that is "exited" with an exit code. The "checking" scripts reads the stdout and stderr files and sends them to the server.

To do

  • UI improvement: add color to exit code, fonts, alignment
  • improve Django consistency with Express server

PBS Pro

PBSPro is not supported because it does not allow users to define their own prologue and epilogue scripts, although in principle one can modify the programs so that it submits a series of dependent jobs to send "start" and "finish" signals. However, additional jobs can add complexity to scheduling, and lead to fewer jobs being run for a user, especially on a busy queue.

About

A tool for tracking TORQUE PBS job status on HPC systems.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published