Speed up CPU side of GPU rechits #125

makortel · 2018-08-09T11:52:31Z

HitsOnCPU allocates the (pinned) host memory on each event. Profiler shows that this is rather costly (more than half of the wall clock time spent in the kernels...). The first commit moves the allocations as the first action in the function, because I first saw a peculiar profile where one of the cudaMallocHost took very long time and delayed the queueing of the asynchronous memory copies, thus making the acquire() to occupy CPU.

Of course this didn't significantly improve the situation though, so the second commit changes the pattern to the same as in raw2cluster, i.e. allocate (also host side) buffers once per module (i.e. per job per EDM stream), and the event product just holds pointers to them. Not ideal, but faster (in a single event the wall clock time of acquire() reduces from ~240 us to ~170 us).

We can of course discuss whether this approach is what we really want (probably not), but at the moment it seems to be the fastest.

@fwyzard @VinInn @felicepantaleo

fwyzard · 2018-08-09T18:20:01Z

Successfully tested at #127 (comment) .

fwyzard · 2018-08-09T18:25:24Z

... the second commit changes the pattern to the same as in raw2cluster, i.e. allocate (also host side) buffers once per module (i.e. per job per EDM stream), and the event product just holds pointers to them. Not ideal, but faster (in a single event the wall clock time of acquire() reduces from ~240 us to ~170 us).

I think reusing some per-module, per-stream buffers does make sense.
Eventually we can look into wrapping it in some construct that guarantees the correctness of the memory reuse, like some kind of vector or stack with an element per stream - but we already know the present usage pattern is fine, by construction.

fwyzard · 2018-08-09T18:27:51Z

Still, we should eventually coalesce all the allocations into fewer ones (here and everywhere else).

fwyzard · 2018-08-10T08:50:33Z

Validation summary

Reference release CMSSW_10_2_1 at d00b7b4
Development branch CMSSW_10_2_X_Patatrack at 907e17c
Testing PRs:

Speed up CPU side of GPU rechits #125 at 4cf6774

`makeTrackValidationPlots.py` plots

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

reference DQM plots for reference release, workflow 10824.5
DQM plots for development release, workflow 10824.5
DQM plots for development release, workflow 10824.8
DQM plots for development release, workflow 10824.7
DQM plots for development release, workflow 10824.9
DQM plots for testing release, workflow 10824.5
DQM plots for testing release, workflow 10824.8
DQM plots for testing release, workflow 10824.7
DQM plots for testing release, workflow 10824.9
DQM comparison for reference workflow 10824.5
DQM comparison for workflow 10824.8
DQM comparison for workflow 10824.7
DQM comparison for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

reference DQM plots for reference release, workflow 10824.5
DQM plots for development release, workflow 10824.5
DQM plots for development release, workflow 10824.8
DQM plots for development release, workflow 10824.7
DQM plots for development release, workflow 10824.9
DQM plots for testing release, workflow 10824.5
DQM plots for testing release, workflow 10824.8
DQM plots for testing release, workflow 10824.7
DQM plots for testing release, workflow 10824.9
DQM comparison for reference workflow 10824.5
DQM comparison for workflow 10824.8
DQM comparison for workflow 10824.7
DQM comparison for workflow 10824.9

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 16 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 150560 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 16 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 150560 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 24 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 16052 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 24 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 16052 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/f8f28fa55a265ae3fcedaec343749ea5f33b1129/log .

fwyzard · 2018-08-10T08:53:03Z

Running the tests again to see if we can get a fair comparison of the performance...

makortel · 2018-08-10T08:59:24Z

Still, we should eventually coalesce all the allocations into fewer ones (here and everywhere else).

I started to test something I've been thinking for a while, could you hold off merging for a few hours?

… use cudaMemcpy2DAsync to transfer Trying to reduce memory copy overheads

makortel · 2018-08-10T09:19:19Z

Ok, the last commit has an experiment of replacing 9 32-bit buffers with one buffer, and 5 16-bit buffers with one buffer. The memory copy reductions are 5->1 and 3->1, respectively. The memory allocation is done with cudaMallocPitch and the copy with cudaMemcpy2DAsync to keep the ability to transfer only nhits elements.

Performance comparison (on a single event)

rechit kernel time stays the same (~180 us, as expected)
rechit acquire time decreases further from ~170 us to ~110 us
device->host transfer time decreases from ~39 us to ~27 us

So yes, it does improve, even if the numbers are rather small in the absolute scale.

I'd really like to not spread the pattern as-is, but build some abstraction (e.g. along FWCore/SOA or FWCore/Utilities/interface/SoATuple.h) on top of it. Also other places seem not to be poised as much with "many small transfers" compared to the shortness of kernel time.

makortel · 2018-08-10T10:58:08Z

Eventually we can look into wrapping it in some construct that guarantees the correctness of the memory reuse, like some kind of vector or stack with an element per stream - but we already know the present usage pattern is fine, by construction.

On the other hand, it seems to me (with #129) that CAHitNtupletHeterogeneousEDProducer is currently allocating ~2 GB on its beginStream(), which starts to severely limit the number of EDM streams we can run in parallel (even on V100).

Ok, looking from the code

cmssw/RecoPixelVertexing/PixelTriplets/plugins/CAHitQuadrupletGeneratorGPU.cu

Lines 121 to 127 in 907e17c

    
           cudaCheck(cudaMalloc(&device_theCells_, 
        
                      maxNumberOfLayerPairs_ * maxNumberOfDoublets_ * sizeof(GPUCACell))); 
        
           cudaCheck(cudaMalloc(&device_nCells_, sizeof(uint32_t))); 
        
           cudaCheck(cudaMemset(device_nCells_, 0, sizeof(uint32_t))); 
        
           cudaCheck(cudaMalloc(&device_isOuterHitOfCell_, 
        
                      maxNumberOfLayers_ * maxNumberOfHits_ * sizeof(GPU::VecArray<unsigned int, maxCellsPerHit_>)));

cmssw/RecoPixelVertexing/PixelTriplets/plugins/CAHitQuadrupletGeneratorGPU.h

Lines 152 to 158 in 907e17c

    
           static constexpr int maxNumberOfQuadruplets_ = 10000; 
        
           static constexpr int maxCellsPerHit_ = 2048; // 512; 
        
           static constexpr int maxNumberOfLayerPairs_ = 13; 
        
           static constexpr int maxNumberOfLayers_ = 10; 
        
           static constexpr int maxNumberOfDoublets_ = 262144; 
        
           static constexpr int maxNumberOfHits_ = 20000; 
        
           static constexpr int maxNumberOfRegions_ = 2;

cmssw/RecoPixelVertexing/PixelTriplets/plugins/GPUCACell.h

Lines 228 to 243 in 907e17c

    
             GPU::VecArray< unsigned int, 40> theOuterNeighbors; 
        
             int theDoubletId; 
        
             int theLayerPairId; 
        
           private: 
        
             unsigned int theInnerHitId; 
        
             unsigned int theOuterHitId; 
        
             float theInnerX; 
        
             float theOuterX; 
        
             float theInnerY; 
        
             float theOuterY; 
        
             float theInnerZ; 
        
             float theOuterZ; 
        
             float theInnerR; 
        
             float theOuterR;

the allocated memory (per EDM stream) are (if I got them right)

device_theCells_: 689 MB
device_isOuterHitOfCell_: 1563 MB

@felicepantaleo

fwyzard · 2018-08-10T12:09:09Z

I'd be interested to know how much of this memory we actually use...

felicepantaleo · 2018-08-10T12:18:17Z

The amount of doublets and the amount of doublets per hit depends on the layer. I set this to be safe in bpix1 and it's the same for all the layers, but one can think of setting a different maximum which changes per layer..
I also thought that we would have had an arena to solve this problem, but the developer hired to do this defected.

fwyzard · 2018-08-10T16:22:13Z

Validation summary

Reference release CMSSW_10_2_1 at d00b7b4
Development branch CMSSW_10_2_X_Patatrack at 0301b0d
Testing PRs:

Speed up CPU side of GPU rechits #125 at 2a710ef

`makeTrackValidationPlots.py` plots

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

reference DQM plots for reference release, workflow 10824.5
DQM plots for development release, workflow 10824.5
DQM plots for development release, workflow 10824.8
DQM plots for development release, workflow 10824.7
DQM plots for development release, workflow 10824.9
DQM plots for testing release, workflow 10824.5
DQM plots for testing release, workflow 10824.8
DQM plots for testing release, workflow 10824.7
DQM plots for testing release, workflow 10824.9
DQM comparison for reference workflow 10824.5
DQM comparison for workflow 10824.8
DQM comparison for workflow 10824.7
DQM comparison for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

reference DQM plots for reference release, workflow 10824.5
DQM plots for development release, workflow 10824.5
DQM plots for development release, workflow 10824.8
DQM plots for development release, workflow 10824.7
DQM plots for development release, workflow 10824.9
DQM plots for testing release, workflow 10824.5
DQM plots for testing release, workflow 10824.8
DQM plots for testing release, workflow 10824.7
DQM plots for testing release, workflow 10824.9
DQM comparison for reference workflow 10824.5
DQM comparison for workflow 10824.8
DQM comparison for workflow 10824.7
DQM comparison for workflow 10824.9

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 16 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 752344 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 24 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 46467 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/d977ae4b9f60ddd1a243d974977bc257732033de/log .

fwyzard · 2018-08-10T16:28:01Z

@makortel would you rather merge it as it is now, or see if we can build a more general approach first ?

makortel · 2018-08-10T19:07:41Z

I would merge it now, and leave the better abstraction to a later exercise.

makortel · 2018-08-10T19:10:03Z

I haven't looked closely the validations of recent PRs, are the fluctuations shown for 10824.8 in above on the expected magnitude?

fwyzard · 2018-08-10T23:53:35Z

Yes, I think we regularly see fluctuations at the percent level in the summary.

On the other hand, initcheck complains about few thousands places now.
Can you have a look ?

makortel · 2018-08-13T07:17:36Z

On the other hand, initcheck complains about few thousands places now.
Can you have a look ?

That should do the trick.

fwyzard · 2018-08-13T09:43:15Z

Thanks.

More in general, I am wondering at what gets flagged by initcheck: I assumed it would flag a memory area if it is read from before having been written to.

As there were no cudaMemset in the original code, does it mean we were using uninitialised memory ?
Or, are the messages from initcheck false positives ?

fwyzard · 2018-08-13T09:45:29Z

RecoLocalTracker/SiPixelRecHits/plugins/PixelRecHits.cu

@@ -48,6 +48,7 @@ namespace pixelgpudetails {
    // Order such that the first ones are the ones transferred to CPU
    static_assert(sizeof(uint32_t) == sizeof(float)); // just stating the obvious
    cudaCheck(cudaMallocPitch(&gpu_.owner_32bit_, &gpu_.owner_32bit_pitch_, MAX_HITS*sizeof(uint32_t), 9));
+    cudaCheck(cudaMemsetAsync(gpu_.owner_32bit_, 0x0, gpu_.owner_32bit_pitch_*9, cudaStream.id()));


why pitch * 9 ?
shouldn't it be pitch * 9 * MAX_HITS*sizeof(uint32_t) ?

or am I just being confused ?

ah, ok, I was being confused: pitch is the rounded up value of MAX_HITS*sizeof(uint32_t), so pitch * 9 it is.

pitch is the allocated width of a row in bytes
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g32bd7a39135594788a542ae72217775c

makortel · 2018-08-13T09:52:23Z

Since the initcheck (presumably) started to complain after cudaMallocPitch+cudaMemcpy2DAsync I suppose it flags the padding. I don't know why anything whould be accessing that though. Maybe cudaMemcpy2DAsync? But even with that I'd find the access a bit strange as the target host memory does not have any padding.

makortel · 2018-08-13T09:52:54Z

So am I just hiding something under the carpet with the cudaMemsetAsync?

fwyzard · 2018-08-13T10:05:17Z

I have no idea...

makortel · 2018-08-13T11:22:24Z

Taking a closer look, the first complaint is from a 32-byte area to which the nhits_*4 points into (in the first row), i.e. it is the first 32-byte area containing uninitialized memory. (I don't know why 32 bytes, but the initcheck seems to complain at each 32-byte interval) So probably doing the cudaMemsetAsync is the correct fix.

(and to my earlier worry about the padding, there is no padding in this case in felk40 at least)

fwyzard · 2018-08-13T11:53:57Z

Thanks for looking into it.

I am still confused: does it mean that we were reading from that memory before having written anything into it ?
I understand that with this fix, the memory is now initialised to zero - but is it OK to read those zeros back ? are we missing some boundary checks ? or are we writing/reading to/from the wrong place ?

P.S I would run the tests again, but EOS seems to be down, so I won't be able to upload the test results anyway...

makortel · 2018-08-13T12:30:57Z

Good questions, I don't know. It could be some weird interplay between cudaMemcpy2D() and initcheck (like the latter not knowing that what exactly the former does, or the former doing something different that one would naively expect).

I made a little test program
makortel@b5f47af
which works fine by default. When one changes the WIDTH to e.g. 257 (to mimick the case that we allocate more than use in the memcpy), initcheck starts to complain.

- allocate HitsOnCPU buffers once per job per edm stream - coalesce multiple 32 bit and multiple 16 bit rechit buffers to two larger buffers; the allocation is done with cudaMallocPitch, the transfer with cudaMemcpy2DAsync - initialise the full memory buffer to keep cuda-memchekc happy

makortel added 2 commits August 9, 2018 13:04

Make rechits more async by moving the memory allocation to first

fcfce44

Allocate HitsOnCPU buffers once per job per EDM stream

4cf6774

fwyzard added the enhancement label Aug 9, 2018

fwyzard mentioned this pull request Aug 9, 2018

Fix uninitialized memory access in GPU CellularAutomaton #127

Merged

Coalesce 32bit and 16bit rechit buffers to two using cudaMallocPitch,…

2a710ef

… use cudaMemcpy2DAsync to transfer Trying to reduce memory copy overheads

Initialize memory

c04392d

makortel force-pushed the rechitsCPUMemory branch from 032a572 to c04392d Compare August 13, 2018 07:15

fwyzard reviewed Aug 13, 2018

View reviewed changes

fwyzard merged commit df4d7cd into cms-patatrack:CMSSW_10_2_X_Patatrack Aug 13, 2018

fwyzard added this to the CMSSW_10_2_2_Patatrack milestone Aug 14, 2018

makortel mentioned this pull request Aug 21, 2018

Add optional flags to disable SOA->legacy conversion and GPU->CPU transfer #132

Merged

fwyzard removed the comparison-pending label Sep 2, 2018

fwyzard modified the milestone: CMSSW_10_2_2_Patatrack Sep 2, 2018

fwyzard removed alca-pending labels Sep 13, 2018

makortel mentioned this pull request Sep 24, 2018

Add infrastructure around cub CachingDeviceAllocator, and use it in SiPixelRawToCluster #172

Merged

fwyzard mentioned this pull request Oct 8, 2020

Patatrack integration - Pixel local reconstruction (9/N) cms-sw/cmssw#31721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up CPU side of GPU rechits #125

Speed up CPU side of GPU rechits #125

makortel commented Aug 9, 2018

fwyzard commented Aug 9, 2018

fwyzard commented Aug 9, 2018

fwyzard commented Aug 9, 2018

fwyzard commented Aug 10, 2018 •

edited

Loading

fwyzard commented Aug 10, 2018

makortel commented Aug 10, 2018

makortel commented Aug 10, 2018

makortel commented Aug 10, 2018 •

edited

Loading

fwyzard commented Aug 10, 2018

felicepantaleo commented Aug 10, 2018

fwyzard commented Aug 10, 2018 •

edited

Loading

fwyzard commented Aug 10, 2018

makortel commented Aug 10, 2018

makortel commented Aug 10, 2018

fwyzard commented Aug 10, 2018

makortel commented Aug 13, 2018

fwyzard commented Aug 13, 2018

fwyzard Aug 13, 2018

fwyzard Aug 13, 2018

makortel Aug 13, 2018

makortel commented Aug 13, 2018

makortel commented Aug 13, 2018

fwyzard commented Aug 13, 2018

makortel commented Aug 13, 2018

fwyzard commented Aug 13, 2018

makortel commented Aug 13, 2018

Speed up CPU side of GPU rechits #125

Speed up CPU side of GPU rechits #125

Conversation

makortel commented Aug 9, 2018

fwyzard commented Aug 9, 2018

fwyzard commented Aug 9, 2018

fwyzard commented Aug 9, 2018

fwyzard commented Aug 10, 2018 • edited Loading

Validation summary

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

Logs

fwyzard commented Aug 10, 2018

makortel commented Aug 10, 2018

makortel commented Aug 10, 2018

makortel commented Aug 10, 2018 • edited Loading

fwyzard commented Aug 10, 2018

felicepantaleo commented Aug 10, 2018

fwyzard commented Aug 10, 2018 • edited Loading

Validation summary

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

Logs

fwyzard commented Aug 10, 2018

makortel commented Aug 10, 2018

makortel commented Aug 10, 2018

fwyzard commented Aug 10, 2018

makortel commented Aug 13, 2018

fwyzard commented Aug 13, 2018

fwyzard Aug 13, 2018

Choose a reason for hiding this comment

fwyzard Aug 13, 2018

Choose a reason for hiding this comment

makortel Aug 13, 2018

Choose a reason for hiding this comment

makortel commented Aug 13, 2018

makortel commented Aug 13, 2018

fwyzard commented Aug 13, 2018

makortel commented Aug 13, 2018

fwyzard commented Aug 13, 2018

makortel commented Aug 13, 2018

fwyzard commented Aug 10, 2018 •

edited

Loading

`makeTrackValidationPlots.py` plots

logs and `nvprof`/`nvvp` profiles

makortel commented Aug 10, 2018 •

edited

Loading

fwyzard commented Aug 10, 2018 •

edited

Loading

`makeTrackValidationPlots.py` plots

logs and `nvprof`/`nvvp` profiles