Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes and cleanup #87

Merged
merged 5 commits into from
Jun 29, 2018

Conversation

makortel
Copy link

This PR contains various stuff I encountered while trying to debug #84.

  • Bugs
    • Added the missing cudaStreamSynchronize() to raw2cluster
      • Even if we eventually want to remove the synchronizations from there, but I wanted a simple fix in case it would have allowed to make progress with debugging (it didn't help though)
    • cuda-memcheck showed an invalid read from the exclusive_scan in rechits. The read was introduced in Work to support GCC 7 with CUDA #41 with
- hitsModuleStart[0] =0;
- cudaCheck(cudaMemcpyAsync(&hitsModuleStart[1], c.clusInModule_d, gpuClustering::MaxNumModules*sizeof(uint32_t), cudaMemcpyDefault, c.stream));
- std::partial_sum(std::begin(hitsModuleStart),std::end(hitsModuleStart),std::begin(hitsModuleStart));
- cudaCheck(cudaMemcpyAsync(hh.hitsModuleStart_d, &hitsModuleStart, (gpuClustering::MaxNumModules+1)*sizeof(uint32_t), cudaMemcpyDefault, c.stream));
+ thrust::exclusive_scan(thrust::cuda::par,
+    c.clusInModule_d,
+    c.clusInModule_d + gpuClustering::MaxNumModules + 1,
+    hh.hitsModuleStart_d);

The use of gpuClustering::MaxNumModules + 1 makes the algorithm to read one element beyond c.clusInModule. I fixed it here with going to similar pattern as was before #41 in CPU with inclusive_scan.

  • Optimizations

    • Raw2cluster was not setting recordWatcherUpdatedsSinceLastTransfer_ = false, so the "modules to unpack" was transferred on each event
    • Raw2cluster was apparently allocating 4 times the needed memory for certain buffers by having an additional sizeof(unsigned int) in the size expression
  • Cleanup

    • Use exactly the same type for memcpy's size expressions in rechits as the buffers are declared with. All of these were already 32 bits, so there is no effect, but I think it is a bit more clear to read.

@fwyzard @felicepantaleo @VinInn @rovere

@cmsbot
Copy link

cmsbot commented Jun 28, 2018

A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_10_2_X_Patatrack.

It involves the following packages:

RecoLocalTracker/SiPixelClusterizer
RecoLocalTracker/SiPixelRecHits

@cmsbot, @fwyzard can you please review it and eventually sign? Thanks.

cms-bot commands are listed here

@fwyzard
Copy link

fwyzard commented Jun 29, 2018

Validation summary

Reference release CMSSW_10_2_0_pre5 at 30c7b03
Development branch CMSSW_10_2_X_Patatrack at 10d59f2
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_0_pre5-PU25ns_102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre5-102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_0_pre5-PU25ns_102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre5-102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

  • reference DQM plots for reference release, workflow 10824.5
  • DQM plots for development release, workflow 10824.5
  • DQM plots for development release, workflow 10824.8 are missing
  • DQM plots for development release, workflow 10824.7
  • DQM plots for development release, workflow 10824.9
  • DQM plots for testing release, workflow 10824.5
  • DQM plots for testing release, workflow 10824.8 are missing
  • DQM plots for testing release, workflow 10824.7
  • DQM plots for testing release, workflow 10824.9
  • DQM comparison for reference workflow 10824.5
  • DQM comparison for workflow 10824.8
  • DQM comparison for workflow 10824.7
  • DQM comparison for workflow 10824.9

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_0_pre5-PU25ns_102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre5-102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/30d58a0ffa5a1b392b368944fb559841b4fd149c/log .

@cmsbot
Copy link

cmsbot commented Jun 29, 2018

Pull request #87 was updated. @cmsbot, @fwyzard can you please check and sign again.

@makortel
Copy link
Author

Rebased to fix conflicts with the recently-merged #81.

@fwyzard
Copy link

fwyzard commented Jun 29, 2018

Thanks Matti.

@fwyzard fwyzard merged commit bc0bc29 into cms-patatrack:CMSSW_10_2_X_Patatrack Jun 29, 2018
@fwyzard fwyzard added this to the CMSSW_10_2_0_pre6_Patatrack milestone Jul 5, 2018
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
  - replace `exclusive_scan` with `memset` + `inclusive_scan` to avoid an invalid read
  - fix memory sizes in allocations and copies
  - add a missing stream synchronize
  - set `recordWatcherUpdatedSinceLastTransfer_` to avoid spurious copies
fwyzard pushed a commit that referenced this pull request Oct 19, 2020
  - replace `exclusive_scan` with `memset` + `inclusive_scan` to avoid an invalid read
  - fix memory sizes in allocations and copies
  - add a missing stream synchronize
  - set `recordWatcherUpdatedSinceLastTransfer_` to avoid spurious copies
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
  - replace `exclusive_scan` with `memset` + `inclusive_scan` to avoid an invalid read
  - fix memory sizes in allocations and copies
  - add a missing stream synchronize
  - set `recordWatcherUpdatedSinceLastTransfer_` to avoid spurious copies
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
  - replace `exclusive_scan` with `memset` + `inclusive_scan` to avoid an invalid read
  - fix memory sizes in allocations and copies
  - add a missing stream synchronize
  - set `recordWatcherUpdatedSinceLastTransfer_` to avoid spurious copies
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
  - replace `exclusive_scan` with `memset` + `inclusive_scan` to avoid an invalid read
  - fix memory sizes in allocations and copies
  - add a missing stream synchronize
  - set `recordWatcherUpdatedSinceLastTransfer_` to avoid spurious copies
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
  - replace `exclusive_scan` with `memset` + `inclusive_scan` to avoid an invalid read
  - fix memory sizes in allocations and copies
  - add a missing stream synchronize
  - set `recordWatcherUpdatedSinceLastTransfer_` to avoid spurious copies
fwyzard pushed a commit that referenced this pull request Dec 25, 2020
  - replace `exclusive_scan` with `memset` + `inclusive_scan` to avoid an invalid read
  - fix memory sizes in allocations and copies
  - add a missing stream synchronize
  - set `recordWatcherUpdatedSinceLastTransfer_` to avoid spurious copies
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
  - replace `exclusive_scan` with `memset` + `inclusive_scan` to avoid an invalid read
  - fix memory sizes in allocations and copies
  - add a missing stream synchronize
  - set `recordWatcherUpdatedSinceLastTransfer_` to avoid spurious copies
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
  - replace `exclusive_scan` with `memset` + `inclusive_scan` to avoid an invalid read
  - fix memory sizes in allocations and copies
  - add a missing stream synchronize
  - set `recordWatcherUpdatedSinceLastTransfer_` to avoid spurious copies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants