You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The number of functional accounts depends on the number of isolated jobs you plan to apply. For instance, if you want to run five jobs in parallel, you need to have one ‘’master functional account’’ and five ‘’functional accounts’’.
In practice, if more than one job is submitted while only one HPC "functional account" is available, it appears that the HEAppE API submits them one after the other.
E.g., after submitting two jobs (say with id 10 and 11) using /heappe/JobManagement/SubmitJob, we observe on the cluster that only one is running, while HEAppE API logs state repeatedly
HEAppE.BusinessLogicTier.Logic.JobManagement.JobManagementLogic - User <API user> is submitting the job with info Id 11
While in this state, if we try to cancel job 11 using /heappe/JobManagement/CancelJob we get an internal server error reply (500) with message Problem occured! Contact the administrators..
From the API logs it appears that HEAppE tries to cancel the job in the cluster, but because it was not yet queued there, it is not assigned a slurm job id, which leads to an error with the scancel command:
INFO 2024-05-20 12:53:27 HEAppE.BusinessLogicTier.Logic.JobManagement.JobManagementLogic - User <API user> is canceling the job with info Id 11
INFO 2024-05-20 12:53:27 HEAppE.HpcConnectionFramework.SchedulerAdapters.Slurm.Generic.SlurmSchedulerAdapter - Cancel jobs "", command "bash -c 'scancel ';", message "Job cancelled manually by the client."
ERROR 2024-05-20 12:53:28 HEAppE.RestApi.ExceptionMiddleware - SSH command error: 'scancel: error: No job identification provided
' Error code: '1' SSH command: 'bash -c 'scancel ';'.
The text was updated successfully, but these errors were encountered:
According to the HEAppE documentation:
In practice, if more than one job is submitted while only one HPC "functional account" is available, it appears that the HEAppE API submits them one after the other.
E.g., after submitting two jobs (say with id 10 and 11) using
/heappe/JobManagement/SubmitJob
, we observe on the cluster that only one is running, while HEAppE API logs state repeatedlyWhile in this state, if we try to cancel job 11 using
/heappe/JobManagement/CancelJob
we get an internal server error reply (500) with messageProblem occured! Contact the administrators.
.From the API logs it appears that HEAppE tries to cancel the job in the cluster, but because it was not yet queued there, it is not assigned a slurm job id, which leads to an error with the
scancel
command:The text was updated successfully, but these errors were encountered: