Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 10 time samples for the HBHE digis in Run 3 #531

Merged
merged 12 commits into from
Aug 8, 2020

Conversation

vkhristenko
Copy link

@vkhristenko vkhristenko commented Aug 5, 2020

PR description:

fixing issue #520.
I attach a cfg that I used to validate. In particular confs should be added for the gpu modifier (apologies, not sure where to add this)
cfg.txt

process.hcalRawToDigiGPU.nsamplesF01HE = 10
process.hcalRawToDigiGPU.nsamplesF5HB = 10
process.hcalRawToDigiGPU.nsamplesF3HB = 10

PR validation:

cuda-memcheck does not show invalids anymore (memcheck/initcheck).

@fwyzard
Copy link

fwyzard commented Aug 5, 2020

@mariadalfonso @vkhristenko in which cases should we use 8 or 10 time samples ?

  • Run 2 data ?
  • Run 2 MC ? Up to / from which release ?
  • Run 3 MC ? Up to / from which release ?

@fwyzard fwyzard added bug-fix HCAL HCAL-related developments labels Aug 5, 2020
@vkhristenko
Copy link
Author

vkhristenko commented Aug 5, 2020 via email

@mariadalfonso
Copy link

Run3 Mc - 10 samples Data - 8 samples, for run 3 as well Run 2 mc - 8 samples Releases not sure...

The above is correct.

Note Mahi (RECO) runs always on 8 samples with 8 pulses but in Run3 in view of the 2024 ageing we need to simulate electronic effects ealier (the real detector has all the effect from -inf)

Maria

@fwyzard
Copy link

fwyzard commented Aug 5, 2020

Thanks for all the details.

Maria, I think the Run 3 MC samples from CMSSW 11.1.0(-pre8) already have the simulation with 10 samples, correct ?

@mariadalfonso
Copy link

yes.
the two relevant PRs 28948 and 29035 entered
https://github.com/cms-sw/cmssw/releases/tag/CMSSW_11_1_0_pre4

@fwyzard
Copy link

fwyzard commented Aug 6, 2020

Thanks !

@fwyzard
Copy link

fwyzard commented Aug 6, 2020

I tested that adding

process.hcalDigisGPU.nsamplesF01HE = 10
process.hcalDigisGPU.nsamplesF3HB = 10
process.hcalDigisGPU.nsamplesF5HB = 10

to the Run 3 MC workflow solves all the cuda-memcheck reports.

@mariadalfonso do you know how this is handled for data vs MC in the cpu workflow ?

@vkhristenko
Copy link
Author

overall, they do not define a diff based on mc vs data (in mahi's producer). it is based on what sits in the data (how many time samples and where the sample of interest is). However, the Raw to Digi producer does have this conf!

in cpu it is much easier to do as you do not need to know how to deal with offsets (for a particular thread...)

technically speaking, this conf could be lifted further, removed, I believe. it would be taken from data then... I decided to leave this like that cause, at the very least it shows what the expectation is for gpu code...

@fwyzard
Copy link

fwyzard commented Aug 6, 2020

However, the Raw to Digi producer does have this conf!

you mean the original cpu rawtodigi ?

@vkhristenko
Copy link
Author

yep yep, the cpu unpacker has this in its configuration...

@fwyzard
Copy link

fwyzard commented Aug 6, 2020

In CMSSW 11_2_0_pre3, I get

process.hcalDigis = cms.EDProducer("HcalRawToDigi",
    ComplainEmptyData = cms.untracked.bool(False),
    ElectronicsMap = cms.string(''),
    ExpectedOrbitMessageTime = cms.untracked.int32(-1),
    FEDs = cms.untracked.vint32(),
    FilterDataQuality = cms.bool(True),
    HcalFirstFED = cms.untracked.int32(700),
    InputLabel = cms.InputTag("rawDataCollector"),
    UnpackCalib = cms.untracked.bool(True),
    UnpackTTP = cms.untracked.bool(True),
    UnpackUMNio = cms.untracked.bool(True),
    UnpackZDC = cms.untracked.bool(True),
    UnpackerMode = cms.untracked.int32(0),
    firstSample = cms.int32(0),
    lastSample = cms.int32(9),
    mightGet = cms.optional.untracked.vstring,
    saveQIE10DataNSamples = cms.untracked.vint32(),
    saveQIE10DataTags = cms.untracked.vstring(),
    saveQIE11DataNSamples = cms.untracked.vint32(),
    saveQIE11DataTags = cms.untracked.vstring(),
    silent = cms.untracked.bool(True)
)

for

  • Run 2 MC (10824.0_TTbar_13)
  • Run 3 MC (11634.0_TTbar)
  • Run2 data (136.885_RunHLTPhy2018D)

If the relevant part is

    firstSample = cms.int32(0),
    lastSample = cms.int32(9),

does it mean that we should use 10 samples for all data and MC, also for Run 2 ?

@vkhristenko
Copy link
Author

this must be configured somewhere else... what matters above is actually this:

    saveQIE10DataNSamples = cms.untracked.vint32(),
    saveQIE10DataTags = cms.untracked.vstring(),
    saveQIE11DataNSamples = cms.untracked.vint32(),
    saveQIE11DataTags = cms.untracked.vstring(),

@mariadalfonso
Copy link

Before the 2018 let's forget.

--> the MC extended the presamples only for the Run3
https://github.com/cms-sw/cmssw/pull/29035/files#diff-dc0f44be94ecf8e8b308650de30eeeb9R167
because of physics reason.

--> DATA stay at 8 likely because of the bandwidth of things out of P5.

All (SOI,nFrame) should be transparent in RECO because the Frame contains the SOI/NSamples (no matter what is MC or DATA).
Only the parameters of the templates should be input parameters.
I will have to go through the code for a major cleanup of all the things duplicated and/or hardcoded.

@fwyzard
Copy link

fwyzard commented Aug 6, 2020

this must be configured somewhere else...

those are already the fully expanded dumps for the cmsDriver.py configurations 🤔

I will have to go through the code for a major cleanup of all the things duplicated and/or hardcoded.

OK, for the time being I'll check if this works:

# Only for Run 3 MC; Run 2 MC and Run 2/3 data use 8 time samples
from Configuration.Eras.Modifier_run3_HB_cff import run3_HB
run3_HB.toModify(hcalDigisGPU,
    nsamplesF01HE = 10,
    nsamplesF3HB = 10,
    nsamplesF5HB = 10
)

(I think run3_HB is not the correct modifier, but it's the closest one I could find...)

@vkhristenko
Copy link
Author

https://github.com/cms-sw/cmssw/blob/master/EventFilter/HcalRawToDigi/plugins/HcalRawToDigi.cc#L117
https://github.com/cms-sw/cmssw/blob/master/EventFilter/HcalRawToDigi/src/HcalUnpacker.cc#L780

this is used here and then during decoding... i'm not sure about the confs... but when debugging all that stuff this is what i have found... in other words, you have to match what sits in data with your expectations...

@fwyzard
Copy link

fwyzard commented Aug 6, 2020

@vkhristenko in your GPU code, do you use the number of time samples only to allocate enough memory ?
would it work to set 10 samples also for running over data, which has only 8 ?

@vkhristenko
Copy link
Author

nope, this sets the stride for the access actually and should be as expected. as I said above in principle this can be lifted... but the easiest for me was to add this configurable things...

@fwyzard

This comment has been minimized.

@fwyzard fwyzard changed the title fixing issue #520 Use 10 time samples for the HBHE digis in Run 3 Aug 6, 2020
@fwyzard
Copy link

fwyzard commented Aug 6, 2020

The test should be fine, since it should use Run 2 settings for running on data and Run 3 settings for MC... still, I'll ask HCAL how this is handled in the current code to figure out the best way to set the parameters.

@vkhristenko of course if (later on) you can make the code detect the correct number from the data itself, even better :-)

@mariadalfonso
Copy link

mariadalfonso commented Aug 6, 2020

This is what should be supported at DIGI
// maxTS = 10, SOI = 5 (new, for better modeling of SiPM noise in MC) Run3 MC
// maxTS = 8, SOI = 3 (typical 8-TS situation in data). Run3 data
// maxTS = 8, SOI = 3 Run2018 data/MC

In the above cases for RECO, we do the fit on redefined -- 8TS and the SOI is always at TS3

@vkhristenko
Copy link
Author

vkhristenko commented Aug 6, 2020 via email

@fwyzard
Copy link

fwyzard commented Aug 7, 2020

@vkhristenko this PR as it is fixes the issue on Run 3 MC (with 10 time samples), and looks fine on data (which still uses 8 time samples).
I would merge it, and let you work in parallel on a more automatic implementation, if that's ok for you ?

@vkhristenko
Copy link
Author

ok, never thought that would make such a huge diff... let me fix that now, apologies...

@makortel
Copy link

makortel commented Aug 7, 2020

IIRC cudaHostFree() introduces a global synchronization point, so when performance depends on concurrent processing, significant impact is not that surprising.

@fwyzard
Copy link

fwyzard commented Aug 7, 2020

diff --git a/EventFilter/HcalRawToDigi/plugins/HcalDigisProducerGPU.cc b/EventFilter/HcalRawToDigi/plugins/HcalDigisProducerGPU.cc
index 2f79bbd7867e..f64200a40495 100644
--- a/EventFilter/HcalRawToDigi/plugins/HcalDigisProducerGPU.cc
+++ b/EventFilter/HcalRawToDigi/plugins/HcalDigisProducerGPU.cc
@@ -15,7 +15,7 @@
 class HcalDigisProducerGPU : public edm::stream::EDProducer<edm::ExternalWork> {
 public:
   explicit HcalDigisProducerGPU(edm::ParameterSet const& ps);
-  ~HcalDigisProducerGPU() override;
+  ~HcalDigisProducerGPU() override = default;
   static void fillDescriptions(edm::ConfigurationDescriptions&);
 
 private:
@@ -53,6 +53,11 @@ private:
   };
   ConfigParameters config_;
 
+  // temporary host buffers
+  HostCollectionf01 hf01_;
+  HostCollectionf5 hf5_;
+  HostCollectionf3 hf3_;
+
   // device products: product owns memory (i.e. not the module)
   DeviceCollectionf01 df01_;
   DeviceCollectionf5 df5_;
@@ -84,9 +89,12 @@ HcalDigisProducerGPU::HcalDigisProducerGPU(const edm::ParameterSet& ps)
   config_.maxChannelsF01HE = ps.getParameter<uint32_t>("maxChannelsF01HE");
   config_.maxChannelsF5HB = ps.getParameter<uint32_t>("maxChannelsF5HB");
   config_.maxChannelsF3HB = ps.getParameter<uint32_t>("maxChannelsF3HB");
-}
 
-HcalDigisProducerGPU::~HcalDigisProducerGPU() {}
+  // preallocate the host buffers
+  hf01_.reserve(config_.maxChannelsF01HE);
+  hf5_.reserve(config_.maxChannelsF5HB);
+  hf3_.reserve(config_.maxChannelsF3HB);
+}
 
 void HcalDigisProducerGPU::acquire(edm::Event const& event,
                                    edm::EventSetup const& setup,
@@ -94,6 +102,10 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
   // raii
   cms::cuda::ScopedContextAcquire ctx{event.streamID(), std::move(holder), cudaState_};
 
+  // clear the host buffers
+  hf01_.clear();
+  hf5_.clear();
+  hf3_.clear();
 
   // event data
   edm::Handle<HBHEDigiCollection> hbheDigis;
@@ -101,17 +113,11 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
   event.getByToken(hbheDigiToken_, hbheDigis);
   event.getByToken(qie11DigiToken_, qie11Digis);
   
-  // tmp host data
-  HostCollectionf01 hf01;
-  HostCollectionf5 hf5;
-  HostCollectionf3 hf3;
-
   // init f5 collection
   if (hbheDigis->size() > 0) {
     auto const nsamples = (*hbheDigis)[0].size();
     auto const stride = hcal::compute_stride<hcal::Flavor5>(nsamples);
-    hf5.stride = stride;
-    hf5.reserve(config_.maxChannelsF5HB);
+    hf5_.stride = stride;
   
     // flavor5 get device blobs
     df5_.stride = stride;
@@ -126,10 +132,8 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
     auto const stride01 = hcal::compute_stride<hcal::Flavor01>(nsamples);
     auto const stride3 = hcal::compute_stride<hcal::Flavor3>(nsamples);
 
-    hf01.stride = stride01;
-    hf3.stride = stride3;
-    hf01.reserve(config_.maxChannelsF01HE);
-    hf3.reserve(config_.maxChannelsF3HB);
+    hf01_.stride = stride01;
+    hf3_.stride = stride3;
   
     // flavor 0/1 get devie blobs
     df01_.stride = stride01;
@@ -147,19 +151,19 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
   for (auto const& hbhe : *hbheDigis) {
     auto const id = hbhe.id().rawId();
     auto const presamples = hbhe.presamples();
-    hf5.ids.push_back(id);
-    hf5.npresamples.push_back(presamples);
+    hf5_.ids.push_back(id);
+    hf5_.npresamples.push_back(presamples);
     auto const stride = hcal::compute_stride<hcal::Flavor5>(hbhe.size());
-    assert(stride == hf5.stride && "strides must be the same for every single digi of the collection");
+    assert(stride == hf5_.stride && "strides must be the same for every single digi of the collection");
     // simple for now...
     static_assert(hcal::Flavor5::HEADER_WORDS == 1);
     uint16_t header_word = (1 << 15) | (0x5 << 12) | (0 << 10) | ((hbhe.sample(0).capid() & 0x3) << 8);
-    hf5.data.push_back(header_word);
+    hf5_.data.push_back(header_word);
     for (unsigned int i = 0; i < stride - hcal::Flavor5::HEADER_WORDS; i++) {
       uint16_t s0 = (0 << 7) | (static_cast<uint8_t>(hbhe.sample(2 * i).adc()) & 0x7f);
       uint16_t s1 = (0 << 7) | (static_cast<uint8_t>(hbhe.sample(2 * i + 1).adc()) & 0x7f);
       uint16_t sample = (s1 << 8) | s0;
-      hf5.data.push_back(sample);
+      hf5_.data.push_back(sample);
     }
   }
 
@@ -170,21 +174,21 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
       if (digi.detid().subdetId() != HcalEndcap)
         continue;
       auto const id = digi.detid().rawId();
-      hf01.ids.push_back(id);
+      hf01_.ids.push_back(id);
       for (int hw = 0; hw < hcal::Flavor01::HEADER_WORDS; hw++)
-        hf01.data.push_back((*qie11Digis)[i][hw]);
+        hf01_.data.push_back((*qie11Digis)[i][hw]);
       for (int sample = 0; sample < digi.samples(); sample++) {
-        hf01.data.push_back((*qie11Digis)[i][hcal::Flavor01::HEADER_WORDS + sample]);
+        hf01_.data.push_back((*qie11Digis)[i][hcal::Flavor01::HEADER_WORDS + sample]);
       }
     } else if (digi.flavor() == 3) {
       if (digi.detid().subdetId() != HcalBarrel)
         continue;
       auto const id = digi.detid().rawId();
-      hf3.ids.push_back(id);
+      hf3_.ids.push_back(id);
       for (int hw = 0; hw < hcal::Flavor3::HEADER_WORDS; hw++)
-        hf3.data.push_back((*qie11Digis)[i][hw]);
+        hf3_.data.push_back((*qie11Digis)[i][hw]);
       for (int sample = 0; sample < digi.samples(); sample++) {
-        hf3.data.push_back((*qie11Digis)[i][hcal::Flavor3::HEADER_WORDS + sample]);
+        hf3_.data.push_back((*qie11Digis)[i][hcal::Flavor3::HEADER_WORDS + sample]);
       }
     }
   }
@@ -198,19 +202,19 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
     cudaCheck(cudaMemcpyAsync(dest, src.data(), src.size() * sizeof(type), cudaMemcpyHostToDevice, ctx.stream()));
   };
 
-  lambdaToTransfer(df01_.data.get(), hf01.data);
-  lambdaToTransfer(df01_.ids.get(), hf01.ids);
+  lambdaToTransfer(df01_.data.get(), hf01_.data);
+  lambdaToTransfer(df01_.ids.get(), hf01_.ids);
 
-  lambdaToTransfer(df5_.data.get(), hf5.data);
-  lambdaToTransfer(df5_.ids.get(), hf5.ids);
-  lambdaToTransfer(df5_.npresamples.get(), hf5.npresamples);
+  lambdaToTransfer(df5_.data.get(), hf5_.data);
+  lambdaToTransfer(df5_.ids.get(), hf5_.ids);
+  lambdaToTransfer(df5_.npresamples.get(), hf5_.npresamples);
 
-  lambdaToTransfer(df3_.data.get(), hf3.data);
-  lambdaToTransfer(df3_.ids.get(), hf3.ids);
+  lambdaToTransfer(df3_.data.get(), hf3_.data);
+  lambdaToTransfer(df3_.ids.get(), hf3_.ids);
 
-  df01_.size = hf01.ids.size();
-  df5_.size = hf5.ids.size();
-  df3_.size = hf3.ids.size();
+  df01_.size = hf01_.ids.size();
+  df5_.size = hf5_.ids.size();
+  df3_.size = hf3_.ids.size();
 }
 
 void HcalDigisProducerGPU::produce(edm::Event& event, edm::EventSetup const& setup) {

@vkhristenko
Copy link
Author

added, the only main diff wrt @fwyzard 's diff is the presence of stride setting before reserve call.. for the max statically known... let me know!

@fwyzard
Copy link

fwyzard commented Aug 7, 2020

Validation summary

Reference release CMSSW_11_1_0 at b7ad279
Development branch cms-patatrack/CMSSW_11_1_X_Patatrack at fcdcb51
Testing PRs:

Validation plots

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.885522.png
zoom-136.885522.png

logs and nvprof/nvvp profiles

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 982096 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885522
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 136.885522

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 48696 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885522
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885522

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 26616 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885522
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) found no CUDA-MEMCHECK results
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885522

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/a0c322c7d08669abebd0c760126f04136d0cd2cd/log .

@fwyzard
Copy link

fwyzard commented Aug 8, 2020

CUDA is still (again ?) not happy :-(

@vkhristenko
Copy link
Author

vkhristenko commented Aug 8, 2020 via email

@fwyzard
Copy link

fwyzard commented Aug 8, 2020

The last round of validation spotted problem with cuda-memcheck here and here.

Just check the ❌ under "testing release".

@vkhristenko
Copy link
Author

I'm not seeing these issues (--tool initcheck), i'm running on ttbar...

@fwyzard
Copy link

fwyzard commented Aug 8, 2020

Running step3.py on TTbar, I can reproduce the crash with the synccheck tool:

cuda-memcheck --tool synccheck --log-file synccheck.log -- cmsRun step3.py
...
08-Aug-2020 17:10:38 CEST  Initiating request to open file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
08-Aug-2020 17:10:39 CEST  Successfully opened file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
Begin processing the 1st record. Run 1, Event 6107, LumiSection 62 on stream 6 at 08-Aug-2020 17:10:53.325 CEST
Begin processing the 2nd record. Run 1, Event 6101, LumiSection 62 on stream 1 at 08-Aug-2020 17:10:53.338 CEST
Begin processing the 3rd record. Run 1, Event 6106, LumiSection 62 on stream 0 at 08-Aug-2020 17:10:53.338 CEST
Begin processing the 4th record. Run 1, Event 6104, LumiSection 62 on stream 4 at 08-Aug-2020 17:10:53.340 CEST
Begin processing the 5th record. Run 1, Event 6102, LumiSection 62 on stream 3 at 08-Aug-2020 17:10:53.341 CEST
Begin processing the 6th record. Run 1, Event 6108, LumiSection 62 on stream 7 at 08-Aug-2020 17:10:53.342 CEST
Begin processing the 7th record. Run 1, Event 6103, LumiSection 62 on stream 5 at 08-Aug-2020 17:10:53.342 CEST
Begin processing the 8th record. Run 1, Event 6105, LumiSection 62 on stream 2 at 08-Aug-2020 17:10:53.343 CEST
Begin processing the 9th record. Run 1, Event 6115, LumiSection 62 on stream 7 at 08-Aug-2020 17:10:53.980 CEST
terminate called after throwing an instance of 'cms::cuda::bad_alloc'
  what():  an illegal memory access was encountered


A fatal system signal has occurred: abort signal
The following is the call stack containing the origin of the signal.

However, this might be a bug in the tool rather than a problem in our code.
Running with the more recent Compute Sanitezer does not reveal any problems:

$ /usr/local/cuda-11.0/Sanitizer/compute-sanitizer --tool synccheck --log-file synccheck.log cmsRun step3.py
...
08-Aug-2020 17:14:53 CEST  Initiating request to open file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
08-Aug-2020 17:14:54 CEST  Successfully opened file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
Begin processing the 1st record. Run 1, Event 6107, LumiSection 62 on stream 5 at 08-Aug-2020 17:15:08.419 CEST
Begin processing the 2nd record. Run 1, Event 6101, LumiSection 62 on stream 0 at 08-Aug-2020 17:15:08.432 CEST
Begin processing the 3rd record. Run 1, Event 6106, LumiSection 62 on stream 2 at 08-Aug-2020 17:15:08.433 CEST
Begin processing the 4th record. Run 1, Event 6104, LumiSection 62 on stream 3 at 08-Aug-2020 17:15:08.435 CEST
Begin processing the 5th record. Run 1, Event 6102, LumiSection 62 on stream 6 at 08-Aug-2020 17:15:08.435 CEST
Begin processing the 6th record. Run 1, Event 6108, LumiSection 62 on stream 4 at 08-Aug-2020 17:15:08.436 CEST
Begin processing the 7th record. Run 1, Event 6103, LumiSection 62 on stream 1 at 08-Aug-2020 17:15:08.437 CEST
Begin processing the 8th record. Run 1, Event 6105, LumiSection 62 on stream 7 at 08-Aug-2020 17:15:08.438 CEST
Begin processing the 9th record. Run 1, Event 6115, LumiSection 62 on stream 6 at 08-Aug-2020 17:15:09.016 CEST
Begin processing the 10th record. Run 1, Event 6116, LumiSection 62 on stream 0 at 08-Aug-2020 17:15:09.016 CEST
...
Begin processing the 99th record. Run 1, Event 6197, LumiSection 62 on stream 0 at 08-Aug-2020 17:15:21.639 CEST
Begin processing the 100th record. Run 1, Event 6199, LumiSection 62 on stream 6 at 08-Aug-2020 17:15:21.639 CEST
08-Aug-2020 17:15:22 CEST  Closed file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root

and

$ cat synccheck.log
========= COMPUTE-SANITIZER
========= ERROR SUMMARY: 0 errors

Unfortunately the version bundled with CUDA in CMSSW is incomplete, I'll have to fix that before we can switch to in the validation tests.

@fwyzard
Copy link

fwyzard commented Aug 8, 2020

As for the memcheck and leak check on the Zee sample, the error is not 100% reproducible, running interactively I got the error it only once.

Otherwise, the report is

========= CUDA-MEMCHECK
========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
========= ERROR SUMMARY: 0 errors

@fwyzard fwyzard merged commit 66fe2d8 into cms-patatrack:CMSSW_11_1_X_Patatrack Aug 8, 2020
@fwyzard
Copy link

fwyzard commented Aug 8, 2020

Merged in CMSSW_11_1_X and CMSSW_11_2_X .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix HCAL HCAL-related developments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants