Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to specify compute accelerators in the configuration #31760

Closed
makortel opened this issue Oct 12, 2020 · 27 comments · Fixed by #36699
Closed

Add a way to specify compute accelerators in the configuration #31760

makortel opened this issue Oct 12, 2020 · 27 comments · Fixed by #36699

Comments

@makortel
Copy link
Contributor

Currently the CUDA tooling relies on auto-discovery of resources (honoring $CUDA_VISIBLE_DEVICES), but we should find a way to specify in configuration

  • forcing the SwitchProducer choice (e.g. with SwitchProducerCUDA either cpu or cuda), and
  • specifying the compute device(s) to be used

The information should propagate to both SwitchProducer(s), that dictate which case(s) of module chains will be run (at the configuration level), and to CUDAService (and similar) that provide finer-grained resource information to the C++ code (list of actual devices).

With $CUDA_VISIBLE_DEVICES alone it is not possible to force a configuration to use cuda on a machine without GPU (the forcing itself would be useful for testing, and running such configuration on a machine without GPU would be an error that should get reported somehow).

@makortel
Copy link
Contributor Author

assign core,heterogeneous

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous,core

@Dr15Jones,@smuzaffar,@makortel,@makortel,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

cms-patatrack#542 demonstrates a hacky way to force the SwitchProducer choice.

@makortel
Copy link
Contributor Author

Thinking out loud (names are bad and long etc):

The SwitchProduder choice(s) could be specified along

process.options.SwitchProducers.SwitchProducerCUDA.choose = cms.untracked.string("cuda")
# or forward-thinking the possibility of event-by-event decisions
process.options.SwitchProducers.SwitchProduderCUDA.choices = cms.untracked.vstring("cuda")

The compute devices could be specified along

process.CUDAService.devices = cms.untracked.vint32(0, 1, 2)

# to disable
process.CUDAService.devices = cms.untracked.vint32()
# to allow everything available, default?
process.CUDAService.devices = cms.untracked.vint32(-1)

This option would supersede the current CUDAService.enable.

It could handy if the SwitchProducerCUDA could make use of this parameter as well, along

  • if empty, or process.CUDAService does not exist, choice will be cpu
  • if non-empty, choice will be cuda (or "cpu and cuda" if we get to event-by-event choice)

Currently the SwitchProducer does not have access to the Process object at the point where the choice is made (and I'm not sure if giving access to the full Process there would be a good idea). Also the code in FWCore/ParameterSet should stay generic, so it would be only SwitchProducerCUDA who knows to look for process.CUDAService. I also gave a though on using process.options to specify the devices (along process.options.offload.cuda.devices = cms.untracked.vint32(0, 1)), but I'm not sure if it would bring any value over using process.CUDAService.

@fwyzard
Copy link
Contributor

fwyzard commented Oct 12, 2020

hi Matti,
why is it better to do something like

process.options.SwitchProducers.SwitchProducerCUDA.choose = cms.untracked.string("cuda")

rather than

SwitchProducerCUDA.choose = cms.untracked.string("cuda")

?

@fwyzard
Copy link
Contributor

fwyzard commented Oct 12, 2020

About:

The compute devices could be specified along

process.CUDAService.devices = cms.untracked.vint32(0, 1, 2)

# to disable
process.CUDAService.devices = cms.untracked.vint32()
# to allow everything available, default?
process.CUDAService.devices = cms.untracked.vint32(-1)

This option would supersede the current CUDAService.enable.

I would prefer to keep CUDAService.enable as it is, and use an empty devices list to specify the default behaviour, that is using all available devices.

@fwyzard
Copy link
Contributor

fwyzard commented Oct 12, 2020

By the way, is this strictly about CUDA, or a more general approach ?

Something like SYCL/oneAPI does not really enumerate the devices as ordinal numbers; rather, it uses a combination of backend (e.g. OpenCL vs CUDA), device type (CPU vs GPU), vendor and device name.

Which makes it more powerful, and a lot more complicated to implement in our software.

@fwyzard
Copy link
Contributor

fwyzard commented Oct 12, 2020

Currently the SwitchProducer does not have access to the Process object at the point where the choice is made (and I'm not sure if giving access to the full Process there would be a good idea).

Would it work to put the whole configuration in the SwitchProducerCUDA, and make the CUDAService pull it from there ?

@makortel
Copy link
Contributor Author

why is it better to do something like

process.options.SwitchProducers.SwitchProducerCUDA.choose = cms.untracked.string("cuda")

rather than

SwitchProducerCUDA.choose = cms.untracked.string("cuda")

?

If by SwitchProducerCUDA.choose you mean something like a class variable (since there is no process.SwitchProducerCUDA), that would not be visible e.g. in edmConfigDump. If you mean an instance variable, then every SwitchProducerCUDA instance in the Process would have to be configured in the same way.

@makortel
Copy link
Contributor Author

By the way, is this strictly about CUDA, or a more general approach ?

I'd like to end up in a solution we think could be later extended to SYCL as well.

Something like SYCL/oneAPI does not really enumerate the devices as ordinal numbers; rather, it uses a combination of backend (e.g. OpenCL vs CUDA), device type (CPU vs GPU), vendor and device name.

Which makes it more powerful, and a lot more complicated to implement in our software.

At the lowest level I think that's fine (just replace cms.vint32 with cms.vstring), but I can imagine e.g. at higher level figuring out the proper strings to be challenging.

@makortel
Copy link
Contributor Author

makortel commented Oct 12, 2020

Currently the SwitchProducer does not have access to the Process object at the point where the choice is made (and I'm not sure if giving access to the full Process there would be a good idea).

Would it work to put the whole configuration in the SwitchProducerCUDA, and make the CUDAService pull it from there ?

The natural dependence goes other way around. One can use CUDAService without SwitchProducerCUDA, e.g. for a configuration that always requires CUDA. But use of SwitchProcucerCUDA in practice implies the use of CUDAService (even if they don't strictly depend on each other, which also causes some duplication of the discovery mechanism).

In the long term we could also end up not using SwitchProducer, e.g. if with SYCL a "one module for all backends" would work better than the current "each backend has its own module" approach (last bullet of #28576 (comment)).

@fwyzard
Copy link
Contributor

fwyzard commented Oct 12, 2020

If by SwitchProducerCUDA.choose you mean something like a class variable (since there is no process.SwitchProducerCUDA), that would not be visible e.g. in edmConfigDump.

If this is the only concern it should be easy to fix: just like the process knows to add

from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA

it could check if SwitchProducerCUDA.choose is not None, and add instead

from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
SwitchProducerCUDA.choose = "cuda"

@fwyzard
Copy link
Contributor

fwyzard commented Oct 12, 2020

Would it work to put the whole configuration in the SwitchProducerCUDA, and make the CUDAService pull it from there ?

The natural dependence goes other way around. One can use CUDAService without SwitchProducerCUDA, e.g. for a configuration that always requires CUDA.

Indeed... at the moment some modules need the CUDAService (disabled) even when we don't use CUDA.

But use of SwitchProcucerCUDA in practice implies the use of CUDAService (even if they don't strictly depend on each other, which also causes some duplication of the discovery mechanism).

The reason I suggested it is because it seems difficult for the SwitchProducerCUDA to extract information from the CUDAService, while the CUDAService might have ways of querying the SwitchProducerCUDA configuration.

Otherwise we could add a new process.options.CUDA = cms.untracked.PSet(...) with all the information, and make both the CUDAService and the SwitchProducerCUDA query it ?

@makortel
Copy link
Contributor Author

If by SwitchProducerCUDA.choose you mean something like a class variable (since there is no process.SwitchProducerCUDA), that would not be visible e.g. in edmConfigDump.

If this is the only concern it should be easy to fix: just like the process knows to add

from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA

it could check if SwitchProducerCUDA.choose is not None, and add instead

from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
SwitchProducerCUDA.choose = "cuda"

For edmConfigDump specifically yes. But it would imply that customizations would not be possible with the Process object alone, but in addition one would have to import a specific object (strictly speaking not true, any SwitchProducerCUDA object in the Process could be used to set the class variable, but I'm a bit afraid that would be confusing; also generally one would have to find one such object instead of knowing directly which knob to tune).

@makortel
Copy link
Contributor Author

Otherwise we could add a new process.options.CUDA = cms.untracked.PSet(...) with all the information, and make both the CUDAService and the SwitchProducerCUDA query it ?

Something like that is indeed one option. On the other hand SwitchProducer would have to be extended in some way to be able to read configuration options outside of itself, technically it would not matter much if it reads process.options or process.CUDAService (but there may be other reasons to favor process.options).

@fwyzard
Copy link
Contributor

fwyzard commented Oct 13, 2020

But it would imply that customizations would not be possible with the Process object alone, but in addition one would have to import a specific object

That's not very different from customisations that take the process as input, and have to import the modules / sequences / tasks they add to it:

def customiseLoadCUDAService(process):
    from HeterogeneousCore.CUDAServices.CUDAService_cfi import CUDAService
    process.CUDAService = cms.Service("CUDAService", ...)

    return process

vs

def customiseForceCUDA(process):
    from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
    SwitchProducerCUDA.choose = 'cuda'

    return process

The latter does not really need the process parameter, but it can be added for consistency with all other customisation functions.

@fwyzard
Copy link
Contributor

fwyzard commented Oct 13, 2020

Anyway, whatever the options are, let's just not pick something that ends up with a syntax too cumbersome to use.

@fwyzard
Copy link
Contributor

fwyzard commented Apr 30, 2021

Coming back to this, maybe we should keep these two things separate:

  • a way to restrict a SwitchProducer to only one (or more) branch(es)
  • a way to limit the devices available to CMSSW

For example, a SwitchProducerCUDA could be configured to follow only the cpu branch (and thus ignore any GPU), or only the cuda branch (and thus require a GPU to be present) or be left free to choose either.

A hypothetical SwitchProducerAlpaka with the serial, tbb and cuda options could be configured to allow only the tbb or cuda ones, etc.


Independently, a CMSSW job can have access any number of CPU cores, any number of CUDA GPUs, any number of SYCL devices, etc.

IMHO this is best handled outside of the job (e.g. via cgroups, taskset, or environment variables ¹ ²), because the actual list of available devices is likely to change from machine to machine.

If we do decide to implement some kind of device selection in CMSSW, I'd prefer to make it orthogonal to the SwitchProducer choice. If their combination results in an unrunnable configuration (e.g. by disabling all GPUs while requiring the cuda branch) the jobs can fail, hopefully with a descriptive error.


¹ CUDA_VISIBLE_DEVICES can be used to limit the CUDA devices available to the runtime: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
² SYCL_DEVICE_FILTER can be used to limit the SYCL devices available to the runtime: https://intel.github.io/llvm-docs/EnvironmentVariables.html#sycl_device_filter

@makortel
Copy link
Contributor Author

makortel commented May 4, 2021

Good point that there can be cases where the SwitchProducer case choice can not be derived from the set of available devices (like Alpaka serial vs tbb).

And perhaps indeed it is best (certainly easiest from the application perspective) to not create a configuration mechanism for whatever devices. (at least until a real motivation for such comes up)

@makortel
Copy link
Contributor Author

makortel commented Oct 1, 2021

I see had forgotten to add one idea for "forcing SwitchProducer choice" (from a chat with @Dr15Jones some time ago). We could make each SwitchProducer instance configurable on choice, e.g. setCase_("cuda") or forceCase_("cuda") (in general in some cases being able to force a case instance-by-instance can make sense, e.g. different ways to run on CPU). To make it easy to set it for all SwitchProducers of a given type, we could add a function to Process along process.setSwitchProducerCaseForAll("SwitchProducerCUDA", "cuda").

@fwyzard
Copy link
Contributor

fwyzard commented Oct 1, 2021

Would this be persisted across edmConfigDump or pickling/unpickling ?

@makortel
Copy link
Contributor Author

makortel commented Oct 1, 2021

Good question. My first thought is that it should be persistent in those ways, because it is set explicitly.

(maybe it should also be possible to unset it, e.g. passing None to those functions)

@makortel
Copy link
Contributor Author

makortel commented Oct 2, 2021

Based on #31760 (comment) and #31760 (comment) I crafted #35510.

@makortel
Copy link
Contributor Author

#36699 takes another attempt, this time adding process.options.accelerators = cms.untracked.vstring(), and adding a new concept of ProcessAccelerator.

@makortel
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants