Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding an automatic log file tail under slurm executor #246

Open
stas00 opened this issue Jul 10, 2024 · 1 comment
Open

adding an automatic log file tail under slurm executor #246

stas00 opened this issue Jul 10, 2024 · 1 comment

Comments

@stas00
Copy link

stas00 commented Jul 10, 2024

When using a local executor the running logs appear right away, in the console it was launched from. But when using slurm one has to fish for the log files.

This can be made easier by automatically printing:

print(f"tail -F {logging_dir}/slurm_logs/{first_slurm_job_id}_0.out")

first_slurm_job_id coming from:

2024-07-10 01:38:05.605 | INFO     | datatrove.executor.slurm:launch_job:280 -
 Slurm job launched successfully with (last) id=109019.

though we want the first, not the last one here.


even fancier would be to run the tail on behalf of the user in the launcher - this way the local and slurm launching experiences will be identical.

But even printing the command to copy-n-paste would already be faster than manual fishing for the log file.


if this doesn't resonate as a feature is it possible to make run() return some attributes? e.g. the first slurm job id - and then the user can code this feature easily themselves.

Thank you!


reading the code I see launch_slurm_job returns some job id and it's then set into run.job_id but this would only be correct if tasks<1000, correct? otherwise it'll return the last job array and not the first one (since your log says ... (last) id=)?

@stas00
Copy link
Author

stas00 commented Jul 10, 2024

this seems to do the trick:

        dist_executor.run()

        print(f"*** Find the slurm logs under: {root_dir}/logs/slurm_processing/slurm_logs/ ")
        if dist_executor.job_id != -1:
            print(f"tail -F {root_dir}/logs/slurm_processing/slurm_logs/{dist_executor.job_id}_0.out")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant