-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype for EventSetup data on GPUs #77
Prototype for EventSetup data on GPUs #77
Conversation
…der to use it outside of RecoLocalTracker/SiPixelClusterizer" This reverts commit 766e967.
A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_10_2_X_Patatrack. It involves the following packages: CalibTracker/Records The following packages do not have a category, yet: HeterogeneousCore/CUDACore @cmsbot, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@@ -56,7 +56,7 @@ | |||
#include <vector> | |||
|
|||
|
|||
class PixelThresholdClusterizer final : public PixelClusterizerBase { | |||
class dso_hidden PixelThresholdClusterizer final : public PixelClusterizerBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why dso_hidden
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation summaryReference release CMSSW_10_2_0_pre4 at 926a81b
|
That is correct.
My understanding is that on IOV transition the ESProduct of the old IOV is deleted, and the ESProduct of the new IOV is constructed (and in case of multiple lumis in flight these two may coexist for some time). What happens with the approach of this PR is that for the old IOV the GPU memory is deallocated, and for the new IOV the GPU memory is allocated (again) and the CPU->GPU transfer is made (so it continues to mirror how the system works on the CPU). One could of course argue that the deallocation+allocation cycle is unnecessary. In a sense that is true, because even in case of multiple lumis in flight one could just multiply the buffers by the maximum number of concurrent lumis and add bookkeeping logic. Alternatively we could use custom allocator avoiding |
Actually, AFAIK the framework does support concurrent runs/lumis, but does not support concurrent IOVs. I assume the framework team would like to implement support for that as well in the future, so I am fine with an approach that does not prevent it. |
Right, I mixed the thins a bit. My understanding is as well that concurrent IOVs are somewhere in the future plans, so I aimed for a solution that would automatically work with that. |
Could we try to review+merge this one soonish? |
} | ||
|
||
SiPixelGainCalibrationForHLTGPU::GPUData::~GPUData() { | ||
if(gainForHLTonGPU != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid the check, as cudaFree
will not do anything if the argument is a null pointer
I don't have other comments, if nobody else has anything to add I will merge this tomorrow morning. |
@fwyzard Let me address your comment #77 (comment) first (doing it right now). |
Validation summaryReference release CMSSW_10_2_0_pre5 at 30c7b03
|
…oMiniAODReview Cmssw 10 1 x tau reco mini aod review
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
Adds a prototype for dealing with EventSetup data on GPUs. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE). Now it is the `ESProduct` who owns the GPU memory. Currently each of the affected `ESProducts` have a method `getGPUProductAsync(cuda::stream_t<>&)` that will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template in `HeterogeneousCore/CUDACore/interface/CUDAESProduct.h`. Technical changes: - `EventSetup`-based implementation for GPU cabling map, gains, etc - add support for multiple devices to `PixelCPEFast` - abstract the `EeventSetup` GPU transfer - move `malloc` and transfer to the lambda - move `cudaFree` outside of the `nullptr` check - move files (back) to the plusing directory - rename `siPixelDigisHeterogeneous` to `siPixelClustersHeterogeneous`
This PR adds a prototype for dealing with EventSetup data on GPUs. Resolves #65. The prototype is applied to the ES data used by Raw2Cluster (cabling map etc, gains) and RecHits (CPE).
About the system
As outlined in the issue, now it is the ESProduct who owns the GPU memory. Currently each of the affected ESProducts have a method
getGPUProductAsync(cuda::stream_t<>&)
(suggestions for better names are welcome), which will allocate the memory on the current GPU device and transfer the data there asynchronously, if the data is not there yet. The functionality of bookkeeping which devices have the data already, and necessary synchronization between multiple threads (only one thread may do the transfer per device) are abstracted to a helper template inHeterogeneousCore/CUDACore/interface/CUDAESProduct.h
.CPE
Adding support to
PixelCPEFast
was easy, as it was already produced by an ESProducer and dealt with copying the data to GPU.Cabling map
Cabling map etc required a bit more work, as
SiPixelFedCablingMap
comes from the DB (so better not modify it)SiPixelQuality
)I created a new ESProduct
SiPixelFedCablingMapGPUWrapper
(for time being onCkfComponentsRecord
record, maybe we could create a "smaller" record in some more "local reco" package thanRecoTracker/Record
) to gather all the necessary ES data. The modules-to-unpack is implemented as a nested struct there to benefit from the constants and loop logic. In case the regionality is not used, the modules-to-unpack are transferred only once.Gains
Gains required also a bit more work, as
SiPixelGainCalibrationForHLT
comes from the DB (so better not modify it)SiPixelGainCalibrationForHLTService
class structure to access the gain values in CPUFortunately for the GPU we already just transfer the internal data of
SiPixelGainCalibrationForHLT
to the GPU, so it is enough to create a new ESProduct (SiPixelGainCalibrationForHLTGPU
, on a newly createdSiPixelGainCalibrationForHLTGPURcd
record which depends also fromTrackerDigiGeometryRecord
), and let it to transfer the necessary data to GPU "as usual".Other stuff
Following @VinInn's suggestion #65 (comment), the Raw2Cluster is moved to
RecoLocalTracker/SiPixelClusterizer
. This also allows to revert 766e967 of #62 (but apparently this fact can not be used later to clean the history as I see the PRs are squashed to single commits at the moment).@fwyzard @felicepantaleo @VinInn @rovere