Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix setting the data pointer of error SimpleVector #236

Merged

Conversation

makortel
Copy link

While working on Raw2Cluster migration to #100, I realized that #109 removed re-setting the data pointer of the erro SimpleVector to point to the host memory after the cudaMemcpyAsync() in

cudaCheck(cudaMemcpyAsync(digis_clusters_h.error.get(), error_d.get(), vsize, cudaMemcpyDefault, stream.id()));
cudaCheck(cudaMemcpyAsync(digis_clusters_h.data.get(), data_d.get(), MAX_ERROR_SIZE, cudaMemcpyDefault, stream.id()));

We haven't hit on the problem because on the consumer side

auto size = digis_clusters_h.error->size();
for (auto i = 0; i < size; i++) {
pixelgpudetails::error_obj err = (*digis_clusters_h.error)[i];

the data is accessed only if size() > 0, and for MC the size appears to be 0.

This PR sets the data pointer in a proper place (after the async copies have completed).

@fwyzard
Copy link

fwyzard commented Dec 20, 2018

Hi @makortel,
you mean the fact that we dropped error_h->set_data(data_h) because we did not need to get error_h->size() any more - but we should have it back to copy the errors from GPU to CPU ?

@makortel
Copy link
Author

@fwyzard Something like that (even though error_h->size() is independent of error_h->set_data(data_h)).

@VinInn
Copy link

VinInn commented Dec 20, 2018

OK! I was just debugging that section of code...

@makortel
Copy link
Author

And yeah, maybe we should add a data workflow to the tests (to at least ensure that it runs for starters).

@VinInn
Copy link

VinInn commented Dec 20, 2018

In any case I think we should drop GPUSimpleVector in favor of the GPUVecArray that requires much less gymnastics.
GPUSimpleVector is useful only if the capacity needs to be set at runtime

@makortel
Copy link
Author

In any case I think we should drop GPUSimpleVector in favor of the GPUVecArray that requires much less gymnastics.
GPUSimpleVector is useful only if the capacity needs to be set at runtime

I fully agree (and even for runtime-known capacity we should make a class with a safer interface).

@VinInn
Copy link

VinInn commented Dec 20, 2018

at the moment git cms-merge-topic (git in general for CMSSW I suppose) does not work @SimonS so cannot test on V100

@fwyzard
Copy link

fwyzard commented Jan 7, 2019

Validation summary

Reference release CMSSW_10_4_0_pre4 at d74dd18
Development branch CMSSW_10_4_X_Patatrack at ebc280c
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/3295a57110647d5fdb8f39b7dae675752ab0dcd6/log .

@fwyzard fwyzard added the bug label Jan 7, 2019
@fwyzard
Copy link

fwyzard commented Jan 7, 2019

No impact on physics performance (as expected):

RelValTTbar_13  reference-10824.5 development-10824.5 development-10824.8 testing-10824.8
Efficiency 0.4818 0.4824 0.4842 0.4842
Number of TrackingParticles (after cuts) 5556 5556 5556 5556
Number of matched TrackingParticles 2677 2680 2690 2690
Fake rate 0.0519 0.0517 0.0337 0.0337
Duplicate rate 0.0168 0.0175 0.0176 0.0179
Number of tracks 32452 32480 31796 31800
Number of true tracks 30769 30801 30724 30727
Number of fake tracks 1683 1679 1072 1073
Number of pileup tracks 27093 27118 27000 27002
Number of duplicate tracks 546 567 559 569

@fwyzard
Copy link

fwyzard commented Jan 7, 2019

No impact on the throughput:

before

Warming up
Running 4 times over 4200 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs
  1725.6 ±   2.1 ev/s (4000 events)
  1728.5 ±   2.1 ev/s (4000 events)
  1731.0 ±   2.0 ev/s (4000 events)
  1736.1 ±   1.4 ev/s (4000 events)

after

Warming up
Running 4 times over 4200 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs
  1736.0 ±   2.3 ev/s (4000 events)
  1735.5 ±   2.4 ev/s (4000 events)
  1730.9 ±   2.0 ev/s (4000 events)
  1723.8 ±   2.3 ev/s (4000 events)

@fwyzard fwyzard merged commit eae2fbc into cms-patatrack:CMSSW_10_4_X_Patatrack Jan 7, 2019
@fwyzard fwyzard added this to the CMSSW_10_4_X_Patatrack milestone Jan 8, 2019
@fwyzard fwyzard added the fixed label Jan 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants