Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MPIPreferences to automatically initialise MPI? #627

Open
giordano opened this issue Sep 11, 2022 · 18 comments
Open

Use MPIPreferences to automatically initialise MPI? #627

giordano opened this issue Sep 11, 2022 · 18 comments

Comments

@giordano
Copy link
Member

While playing with mpi4py earlier this week I realised it automatically initialises MPI at loading time. This can be controlled with the mpi4py.rc object, including the threading setup. I think we can do something similar with an option in MPIPreferences.jl. How does that sound?

@sloede
Copy link
Member

sloede commented Sep 12, 2022

I like the idea. However, how would this work/what would happen if multiple dependencies have different threading requests - would it be that the first one to load MPI.jl just "wins" and determines the setting for everyone?

@giordano
Copy link
Member Author

Wouldn't that be a problem also without auto-initialisation? The idea is that you could choose what to do with the Preferences.toml (including specifying the threading setup for that environment), so you kinda need to know what to do anyway.

I'm just floating the idea of this feature since I found it in mpi4py (and in many occasions I'd have preferred MPI to autoinitialise, instead of killing the session with the first MPI.Comm_rank(MPI.COMM_WORLD) because I had forgot to run MPI.Init()), but I haven't thought through all the possibilities 🙂

@sloede
Copy link
Member

sloede commented Sep 12, 2022

Wouldn't that be a problem also without auto-initialisation?

Absolutely. But right now it is kind of accepted that it is somewhat undefined behavior, depending on which packages (and dependencies) you are loading and in which order. However, once we introduce something like a preferences-based approach, users would (rightfully, imho) assume that everything now happens deterministically - or at least that there are no silent errors anymore. I was just wondering what would happen in case multiple conflicting Preferences.toml exist for dependencies, and/or if a central Preferences.toml would (silently) overrule that.

Again, if there were a central and well-documented mechanism for auto-initing MPI, I would appreciate it!

@simonbyrne
Copy link
Member

A couple of considerations:

  • Are there cases where you would load the package without intending to initialize?
    • The only case I can think of at the moment is to call mpiexec: we could move that to a subpackage (or just make it part of MPIPreferences?)
  • If you build MPI.jl into a system image, then __init__() is called at Julia load time: this could be an issue if other packages have MPI.jl as a dependency?
  • Preferences.toml is a build-time configuration, not a runtime one: i.e. modifying it will trigger recompilation (or necessitate a rebuild of the system image if it is used)

@giordano
Copy link
Member Author

  1. good question. As far as I'm concerned, basically every time I load MPI.jl I want to initialise MPI, I'm a bit sick of killing off REPL sessions because I forgot to do that 😄
  2. that's a good point, I'm honestly not sure how that's going to work
  3. yes, that was kinda on purpose. But if we don't want to force recompilation we could have constant Ref variables in MPIPreferences.jl that users need to change before loading MPI.jl to tweak the behaviour, which should more closely match mpi4py behaviour. The point is that we need to have the option (either a Preferences.jl one or a variable in MPIPreferences.jl, or whatever) in a package different from MPI.jl, since in Julia we don't have separate loading times within a package.

@simonbyrne
Copy link
Member

  1. good question. As far as I'm concerned, basically every time I load MPI.jl I want to initialise MPI, I'm a bit sick of killing off REPL sessions because I forgot to do that 😄

Could add a Requires hook to your .julia/config/startup.jl?

@simonbyrne
Copy link
Member

simonbyrne commented Sep 28, 2022

Okay, I've thought a bit more about this: how about the following:

  • we move mpiexec (and mpiexec_wrapper.jl) into MPIPreferences.jl:
    • if you just want access to the MPI launcher, you can just `using MPIPreferences.
  • we have an option for disabling auto-MPI.Init() via environment variable (MPI_JULIA_INIT=false?). We can also have an option for selecing the threadlevel (MPI_JULIA_THREADLEVEL).

Thoughts?

@simonbyrne
Copy link
Member

Oh, not sure that we can load packages until after MPIPreferences?

@simonbyrne
Copy link
Member

Wow, after trying to debug an issue with this, I have to say I am much more in favor of this.

The fact that it just kills the job without giving a stacktrace is painful.

@simonbyrne
Copy link
Member

One example where this will be a problem: on our Slurm cluster, if I get an interactive session via srun --pty bash -l, then call MPI.Init() from within an interactive MPI session, I get an error about my OpenMPI not being built with PMIx support.

@simonbyrne
Copy link
Member

Using the default JLL binaries gives even worse errors:

julia> using MPI
[ Info: Precompiling MPI [da04e1cc-30fd-572f-bb4f-1f8673147195]

julia> MPI.Init()
[cli_0]: write_line error; fd=11 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_0]: Unable to write to PMI_fd
[cli_0]: write_line error; fd=11 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Fatal error in internal_Init_thread: Other MPI error, error stack:
internal_Init_thread(60): MPI_Init_thread(argc=(nil), argv=(nil), required=2, provided=0x7ffd7abe14e0) failed
MPII_Init_thread(209)...:
MPID_Init(72)...........:
init_local(135).........: channel initialization failed
init_pg(332)............:
MPIR_pmi_init(111)......: PMI_Get_appnum returned -1
[cli_0]: write_line error; fd=11 buf=:cmd=abort exitcode=740937487
:

@giordano
Copy link
Member Author

giordano commented Oct 9, 2022

One example where this will be a problem: on our Slurm cluster, if I get an interactive session via srun --pty bash -l, then call MPI.Init() from within an interactive MPI session, I get an error about my OpenMPI not being built with PMIx support.

I don't understand this: what's the difference between a Julia process started inside an interactive shell session launched by srun and a normal Julia process on a login node? srun is driving only the bash process, right?

@luraess
Copy link
Contributor

luraess commented Oct 9, 2022

One example where this will be a problem: on our Slurm cluster, if I get an interactive session via srun --pty bash -l, then call MPI.Init() from within an interactive MPI session, I get an error about my OpenMPI not being built with PMIx support.

I don't understand why PMIx issue should be dependent on MPI.jl. On one of our test cluster, I run into these PMIx issues (with MPI.jl v0.19.2). The workaround was to srun --mpi=pmix ... and to export SLURM_MPI_TYPE=pmix.

As a general thought after playing with MPI.jl v0.20 now on various machines; it would be nice to not over-engineer the setup machinery as things get usually much more complicated on clusters and supercomputers than they are on local machines (no internet on compute nodes, missing libs/env on login nodes, etc...). So in general, the more basic and robust workflows should be preferred. Ideally, with workflows that would need as few as possible "interactive setup" steps.

@giordano
Copy link
Member Author

giordano commented Oct 9, 2022

Ideally, with workflows that would need as few as possible "interactive setup" steps.

What "interactive setup" steps you're referring to? I don't think there is anything strictly interactive?

@luraess
Copy link
Contributor

luraess commented Oct 9, 2022

Apologies, I realise this being slightly OT. Previously, an ENV var was needed and then only a Pkg.build() on a compute node most often. Now, we already need more steps to first have MPIPreferences to be set-up, select correct ABI and have the LocalPreference files created and MPI to pick it up, which is more steps. My concern is just that these setup steps should be as few as possible and don't need fancy machinery that may cause trouble on large clusters.
EDIT: "Interactive" was not accurate wording ->Julia config tasks/jobs

@giordano
Copy link
Member Author

giordano commented Oct 9, 2022

I must have a different concept of "interactive". Nothing of what you described is interactive. Until v0.19, to use system MPI you had to do

julia --project -e 'ENV["JULIA_MPI_BINARY"]="system"; using Pkg; Pkg.build("MPI"; verbose=true)'

with v0.20 you have to do

julia --project -e using Pkg; Pkg.add("MPIPreferences"); using MPIPreferences; MPIPreferences.use_system_binary()'

which doesn't look much different to me and it requires the same number of non-interactive commands to be run.

@simonbyrne
Copy link
Member

simonbyrne commented Oct 9, 2022

I don't understand this: what's the difference between a Julia process started inside an interactive shell session launched by srun and a normal Julia process on a login node? srun is driving only the bash process, right?

I guess it sets some environment variables that changes the behaviour of MPI? Honestly I have no idea. I don't see it in sbatch, so I guess it is something to do with how srun works?

I don't understand why PMIx issue should be dependent on MPI.jl. On one of our test cluster, I run into these PMIx issues (with MPI.jl v0.19.2). The workaround was to srun --mpi=pmix ... and to export SLURM_MPI_TYPE=pmix.

That does help, at least once:

┌─[27]──[Sun Oct 09]──[14:38:05]────────────────────────────────────────
│ spjbyrne@login1:~/src/MPI.jl
├ srun --reservation=clima --mpi=pmix -t 01:10:00 --mem 2G --pty bash -l

┌─[1]──[Sun Oct 09]──[14:38:16]────────────────────────────────────────
│ spjbyrne@hpc-92-37:~/src/MPI.jl
├ julia --project -e 'using MPI; MPI.Init(); @show MPI.Comm_rank(MPI.COMM_WORLD)'
MPI.Comm_rank(MPI.COMM_WORLD) = 0
┌─[2]──[Sun Oct 09]──[14:38:24]────────────────────────────────────────
│ spjbyrne@hpc-92-37:~/src/MPI.jl
├ julia --project -e 'using MPI; MPI.Init(); @show MPI.Comm_rank(MPI.COMM_WORLD)'
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[hpc-92-37.cm.cluster:11855] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

Now, we already need more steps to first have MPIPreferences to be set-up, select correct ABI and have the LocalPreference files created and MPI to pick it up,

If you already have the LocalPrefrences.toml file (or you have it in your global env), it shouldn't be required at all.

@luraess
Copy link
Contributor

luraess commented Oct 9, 2022

That does help, at least once:

I had similar issue on that machine having srun built with PMIx support. The MPI failure appeared upon running several times julia and initialising MPI in a same srun call.

The solution was to always start julia in a separate srun, instead of starting 1 srun --pty bash process and opening julia multiple times there:

salloc ...
srun --pty julia
srun --pty julia

instead of

srun --pty bash
julia
julia

Maybe it's the same for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants