Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve work-to-device scheduling #277

Open
makortel opened this issue Mar 11, 2019 · 2 comments
Open

Improve work-to-device scheduling #277

makortel opened this issue Mar 11, 2019 · 2 comments

Comments

@makortel
Copy link

This issue is about operating multiple GPUs.

In HeterogeneousEDProducer the (work in) EDM streams is assigned to devices in a "static" way, i.e. each producer on each EDM stream will always use the same device.

In #100 the logic stays (for backwards compatibility), but after all producers have been migrated from HeterogeneousEDProducer to the model of #100, we could try more clever load balancing between the devices.

@makortel
Copy link
Author

One question is that whether we schedule each event fully to one device, or allow producers of a single event to use (or put their output product on the memory of) different devices.

Former is certainly simpler to start with.

Latter needs a model for reading a data product from the memory of another device. This can be trivially achieved with unified memory (#85). In #100 (comment) @fwyzard commented also

Often, we could let the currentDevice_ read the memory of the dataDevice over the PCI-e or NVLINK bus.

@fwyzard
Copy link

fwyzard commented Mar 11, 2019

Just for the record, I'm trying to think what benefits we could find in running part of the event on different devices (of the same type).

A simple use case (that we are very far from) is if a single event is enough to saturate and exceed a single device.

An other is, if we end up being memory limited, we could keep some conditions (e.g. pixel calibrations) on one gpu, and others (e.g. calorimeters calibrations) on a second one, and run the relevant algorithms on the corresponding device.

Both seem far off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants