Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set max channels separately EE and EB for ECAL #517

Conversation

amassiro
Copy link

Set max channels separately EE and EB for ECAL

Similar to #516, but now for 11_1_X release

@fwyzard fwyzard added the ECAL ECAL-related developments label Jul 18, 2020
@fwyzard
Copy link

fwyzard commented Jul 18, 2020

Validation summary

Reference release CMSSW_11_1_0 at b7ad279
Development branch cms-patatrack/CMSSW_11_1_X_Patatrack at 88d42f7
Testing PRs:

Validation plots

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.885502.png
zoom-136.885502.png
scan-136.885522.png
zoom-136.885522.png

logs and nvprof/nvvp profiles

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • step3.py: log
    • profile.py: log, profile and summary are missing, see the full log for more information
    • ⚠️ cuda-memcheck --tool initcheck did not run
    • ⚠️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all did not run
    • ⚠️ cuda-memcheck --tool synccheck did not run
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 967152 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • step3.py: log
    • profile.py: log, profile and summary are missing, see the full log for more information
    • ⚠️ cuda-memcheck --tool initcheck did not run
    • ⚠️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all did not run
    • ⚠️ cuda-memcheck --tool synccheck did not run
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 879152 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 48264 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 46176 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 25848 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 26872 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/3dd4b1cc050826346d8527b3ec41c9817ef678b4/log .

@fwyzard
Copy link

fwyzard commented Jul 18, 2020

It looks like there are still problems in the ECAL code, as reported in the TTbar step3.log.

To reproduce, just run the TTbar step3.py using CMSSW_11_1_X plus this PR.

Some changes to make the debugging simpler:

# process one event at a time
process.options.numberOfThreads = cms.untracked.uint32( 1 )
process.options.numberOfStreams = cms.untracked.uint32( 1 )

# skip the first 95 events
process.source.skipEvents = cms.untracked.uint32(95)

# silence the EcalDQM messages
process.MessageLogger.categories.append("EcalDQM")
process.MessageLogger.cerr.EcalDQM = cms.untracked.PSet(
  limit = cms.untracked.int32(0)
)

Running under cuda-memcheck with those changes, I got

18-Jul-2020 13:11:07 CEST  Initiating request to open file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
18-Jul-2020 13:11:09 CEST  Successfully opened file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
Begin processing the 1st record. Run 1, Event 6200, LumiSection 62 on stream 0 at 18-Jul-2020 13:11:20.606 CEST
ebdigis.size: 1440
eedigis.size: 748
Begin processing the 2nd record. Run 1, Event 6198, LumiSection 62 on stream 0 at 18-Jul-2020 13:11:22.404 CEST
ebdigis.size: 2145
eedigis.size: 654
Begin processing the 3rd record. Run 1, Event 6195, LumiSection 62 on stream 0 at 18-Jul-2020 13:11:22.958 CEST
ebdigis.size: 1804
eedigis.size: 1077
Begin processing the 4th record. Run 1, Event 6197, LumiSection 62 on stream 0 at 18-Jul-2020 13:11:23.542 CEST
ebdigis.size: 2661
eedigis.size: 818
Begin processing the 5th record. Run 1, Event 6199, LumiSection 62 on stream 0 at 18-Jul-2020 13:11:24.142 CEST
========= Invalid __global__ read of size 8
=========     at 0x00000530 in /data/user/fwyzard/patatrack/validation/run_517.7dOQfOmvHU/testing/src/RecoLocalCalo/EcalRecProducers/plugins/EcalRecHitBuilderKernels.cu:215:ecal::rechit::kernel_create_ecal_rehit(int const *, unsigned int, bool, bool, bool, bool, bool, bool, bool, float, float, float, float, int const *, unsigned int const *, unsigned int const *, unsigned int, unsigned int, float const *, float const *, unsigned short const *, float const *, float const *, float const *, float const *, float const *, __int64 const *, __int64 const *, __int64 const *, float const *, float const *, float const *, __int64 const *, __int64 const *, __int64 const *, __int64, unsigned int const *, unsigned int const *, float const *, float const *, float const *, float const *, float const *, float const *, unsigned int const *, unsigned int const *, unsigned int*, unsigned int*, float*, float*, float*, float*, float*, float*, unsigned int*, unsigned int*, unsigned int*, unsigned int*, int, unsigned int, unsigned int)
=========     by thread (12,0,0) in block (80,0,0)
=========     Address 0x7fc2c17fcdf0 is out of bounds
=========     Device Frame:/data/user/fwyzard/patatrack/validation/run_517.7dOQfOmvHU/testing/src/RecoLocalCalo/EcalRecProducers/plugins/EcalRecHitBuilderKernels.cu:215:ecal::rechit::kernel_create_ecal_rehit(int const *, unsigned int, bool, bool, bool, bool, bool, bool, bool, float, float, float, float, int const *, unsigned int const *, unsigned int const *, unsigned int, unsigned int, float const *, float const *, unsigned short const *, float const *, float const *, float const *, float const *, float const *, __int64 const *, __int64 const *, __int64 const *, float const *, float const *, float const *, __int64 const *, __int64 const *, __int64 const *, __int64, unsigned int const *, unsigned int const *, float const *, float const *, float const *, float const *, float const *, float const *, unsigned int const *, unsigned int const *, unsigned int*, unsigned int*, float*, float*, float*, float*, float*, float*, unsigned int*, unsigned int*, unsigned int*, unsigned int*, int, unsigned int, unsigned int) (ecal::rechit::kernel_crea
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/lib64/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2c74be]
=========     Host Frame:/data/user/fwyzard/patatrack/validation/run_517.7dOQfOmvHU/testing/external/slc7_amd64_gcc820/lib/libcudart.so.11.0 [0xf62b]
=========     Host Frame:/data/user/fwyzard/patatrack/validation/run_517.7dOQfOmvHU/testing/external/slc7_amd64_gcc820/lib/libcudart.so.11.0 (cudaLaunchKernel + 0x1c1) [0x4f5b1]
=========     Host Frame:/data/user/fwyzard/patatrack/validation/run_517.7dOQfOmvHU/testing/lib/slc7_amd64_gcc820/pluginRecoLocalCaloEcalRecProducersPlugins.so (_Z201__device_stub__ZN4ecal6rechit24kernel_create_ecal_rehitEPKijbbbbbbbffffS2_PKjS4_jjPKfS6_PKtS6_S6_S6_S6_S6_PKySA_SA_S6_S6_S6_SA_SA_SA_yS4_S4_S6_S6_S6_S6_S6_S6_S4_S4_PjSB_PfSC_SC_SC_SC_SC_SB_SB_SB_SB_ijjPKijbbbbbbbffffS0_PKjS2_jjPKfS4_PKtS4_S4_S4_S4_S4_PKyS8_S8_S4_S4_S4_S8_S8_S8_yS2_S2_S4_S4_S4_S4_S4_S4_S2_S2_PjS9_PfSA_SA_SA_SA_SA_S9_S9_S9_S9_ijj + 0x582) [0x1e90f2]
...

followed by many more errors, and eventually a segmentation fault.

@amassiro @vkhristenko could you have a look ?

@vkhristenko
Copy link

vkhristenko commented Jul 18, 2020 via email

@amassiro
Copy link
Author

I'm looking at it ... but so far I could not find the error.

One question: what does "process.validation_step" do?

If I remove it, it runs with no errors, once I get it back I have this error message:

Module: EcalBarrelRecHitsValidation:ecalBarrelRecHitsValidation (crashed)

@vkhristenko
Copy link

vkhristenko commented Jul 18, 2020 via email

@fwyzard
Copy link

fwyzard commented Jul 18, 2020

One question: what does process.validation_step do?

It runs the ECAL-only validation, which is somehow adapted from the standard ECAL validation.

If you remove it, does the EcalCPUDigisProducer module still run ?

@fwyzard
Copy link

fwyzard commented Jul 18, 2020

@amassiro I've tried removing the validation_step, but I still get the original error:

18-Jul-2020 19:27:18 CEST  Initiating request to open file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
18-Jul-2020 19:27:19 CEST  Successfully opened file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
Begin processing the 1st record. Run 1, Event 6200, LumiSection 62 on stream 0 at 18-Jul-2020 19:27:32.134 CEST
ebdigis.size: 1440
eedigis.size: 748
Begin processing the 2nd record. Run 1, Event 6198, LumiSection 62 on stream 0 at 18-Jul-2020 19:27:33.659 CEST
ebdigis.size: 2145
eedigis.size: 654
Begin processing the 3rd record. Run 1, Event 6195, LumiSection 62 on stream 0 at 18-Jul-2020 19:27:34.141 CEST
ebdigis.size: 1804
eedigis.size: 1077
Begin processing the 4th record. Run 1, Event 6197, LumiSection 62 on stream 0 at 18-Jul-2020 19:27:34.461 CEST
ebdigis.size: 2661
eedigis.size: 818
Begin processing the 5th record. Run 1, Event 6199, LumiSection 62 on stream 0 at 18-Jul-2020 19:27:34.769 CEST
ebdigis.size: 3464
eedigis.size: 2266
----- Begin Fatal Exception 18-Jul-2020 19:27:34 CEST-----------------------
An exception of category 'StdException' occurred while
   [0] Processing  Event run: 1 lumi: 62 event: 6199 stream: 0
   [1] Prefetching for module EcalFEDMonitor/'ecalFEDMonitor'
   [2] Calling method for module EcalCPUDigisProducer/'ecalDigis'
Exception Message:
A std::exception was thrown.

/data/user/fwyzard/patatrack/validation/run_517.7dOQfOmvHU/testing/src/EventFilter/EcalRawToDigi/plugins/EcalCPUDigisProducer.cc, line 150:
cudaCheck(cudaMemcpyAsync( dataeetmp.data(), eedigis.data.get(), dataeetmp.size() * sizeof(uint16_t), cudaMemcpyDeviceToHost, ctx.stream()));
cudaErrorInvalidValue: invalid argument
----- End Fatal Exception -------------------------------------------------
18-Jul-2020 19:27:34 CEST  Closed file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root

This was with skipEvents = 95 to get there faster, and a print out for ebdigis.size / eedigis.size to check that the module actually runs.

@fwyzard
Copy link

fwyzard commented Jul 18, 2020

@amassiro where does the * 10 comes from ?

// resize tmp buffers
// FIXME remove hardcoded values
idsebtmp.resize(ebdigis.size);
dataebtmp.resize(ebdigis.size * 10);
idseetmp.resize(eedigis.size);
dataeetmp.resize(eedigis.size * 10);

@amassiro
Copy link
Author

@amassiro I've tried removing the validation_step, but I still get the original error:

18-Jul-2020 19:27:18 CEST  Initiating request to open file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
18-Jul-2020 19:27:19 CEST  Successfully opened file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
Begin processing the 1st record. Run 1, Event 6200, LumiSection 62 on stream 0 at 18-Jul-2020 19:27:32.134 CEST
ebdigis.size: 1440
eedigis.size: 748
Begin processing the 2nd record. Run 1, Event 6198, LumiSection 62 on stream 0 at 18-Jul-2020 19:27:33.659 CEST
ebdigis.size: 2145
eedigis.size: 654
Begin processing the 3rd record. Run 1, Event 6195, LumiSection 62 on stream 0 at 18-Jul-2020 19:27:34.141 CEST
ebdigis.size: 1804
eedigis.size: 1077
Begin processing the 4th record. Run 1, Event 6197, LumiSection 62 on stream 0 at 18-Jul-2020 19:27:34.461 CEST
ebdigis.size: 2661
eedigis.size: 818
Begin processing the 5th record. Run 1, Event 6199, LumiSection 62 on stream 0 at 18-Jul-2020 19:27:34.769 CEST
ebdigis.size: 3464
eedigis.size: 2266
----- Begin Fatal Exception 18-Jul-2020 19:27:34 CEST-----------------------
An exception of category 'StdException' occurred while
   [0] Processing  Event run: 1 lumi: 62 event: 6199 stream: 0
   [1] Prefetching for module EcalFEDMonitor/'ecalFEDMonitor'
   [2] Calling method for module EcalCPUDigisProducer/'ecalDigis'
Exception Message:
A std::exception was thrown.

/data/user/fwyzard/patatrack/validation/run_517.7dOQfOmvHU/testing/src/EventFilter/EcalRawToDigi/plugins/EcalCPUDigisProducer.cc, line 150:
cudaCheck(cudaMemcpyAsync( dataeetmp.data(), eedigis.data.get(), dataeetmp.size() * sizeof(uint16_t), cudaMemcpyDeviceToHost, ctx.stream()));
cudaErrorInvalidValue: invalid argument
----- End Fatal Exception -------------------------------------------------
18-Jul-2020 19:27:34 CEST  Closed file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root

This was with skipEvents = 95 to get there faster, and a print out for ebdigis.size / eedigis.size to check that the module actually runs.

sorry, I meant I left everything up to the validation step (excluded)

@amassiro
Copy link
Author

@amassiro where does the * 10 comes from ?

// resize tmp buffers
// FIXME remove hardcoded values
idsebtmp.resize(ebdigis.size);
dataebtmp.resize(ebdigis.size * 10);
idseetmp.resize(eedigis.size);
dataeetmp.resize(eedigis.size * 10);

It should be the 10 digits per channel: each channel has 10 int (10 sampled points from the electronics pulse shape)

@vkhristenko
Copy link

vkhristenko commented Jul 19, 2020 via email

@amassiro
Copy link
Author

The fix in 56536bc should deal with the error shown before.
I tested in local, on the MC sample, and it runs.

@@ -40,11 +41,10 @@ namespace ecal {
// FIXME: we should separate max channels parameter for eb and ee
// FIXME: replace hardcoded values
void allocate(ConfigurationParameters const &config, cudaStream_t cudaStream) {
digisEB.data = cms::cuda::make_device_unique<uint16_t[]>(config.maxChannels, cudaStream);
digisEE.data = cms::cuda::make_device_unique<uint16_t[]>(config.maxChannels, cudaStream);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the missing *10 here is also why in the past we were not able to validate fully the MC workflow?
... btw, not it should be fixed.

@fwyzard
Copy link

fwyzard commented Jul 21, 2020

thanks @amassiro indeed I can re-run the failing workflows without crashes, and cuda-memcheck is also happy :-)

@fwyzard
Copy link

fwyzard commented Jul 21, 2020

I'll re-run the tests one last time, and merge if they don't show additional failures.

@fwyzard
Copy link

fwyzard commented Jul 21, 2020

Validation summary

Reference release CMSSW_11_1_0 at b7ad279
Development branch cms-patatrack/CMSSW_11_1_X_Patatrack at 9156aad
Testing PRs:

Validation plots

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

🚧 Validation running at fu-c2a02-35-02:/data/user/fwyzard/patatrack/validation/run_517.RyT4TIWegb ...

@fwyzard
Copy link

fwyzard commented Jul 21, 2020

Validation summary

Reference release CMSSW_11_1_0 at b7ad279
Development branch cms-patatrack/CMSSW_11_1_X_Patatrack at 9156aad
Testing PRs:

Validation plots

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.885502.png
zoom-136.885502.png
scan-136.885522.png
zoom-136.885522.png

🚧 Validation running at fu-c2a02-35-02:/data/user/fwyzard/patatrack/validation/run_517.RyT4TIWegb ...

@fwyzard
Copy link

fwyzard commented Jul 21, 2020

Validation summary

Reference release CMSSW_11_1_0 at b7ad279
Development branch cms-patatrack/CMSSW_11_1_X_Patatrack at 9156aad
Testing PRs:

Validation plots

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.885502.png
zoom-136.885502.png
scan-136.885522.png
zoom-136.885522.png

logs and nvprof/nvvp profiles

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • step3.py: log
    • profile.py: log, profile and summary are missing, see the full log for more information
    • ⚠️ cuda-memcheck --tool initcheck did not run
    • ⚠️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all did not run
    • ⚠️ cuda-memcheck --tool synccheck did not run
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 963160 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 930760 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 47408 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 44536 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 26760 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 26760 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

🚧 Validation running at fu-c2a02-35-02:/data/user/fwyzard/patatrack/validation/run_517.RyT4TIWegb ...

@fwyzard
Copy link

fwyzard commented Jul 21, 2020

Validation summary

Reference release CMSSW_11_1_0 at b7ad279
Development branch cms-patatrack/CMSSW_11_1_X_Patatrack at 9156aad
Testing PRs:

Validation plots

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 11634.5
  • tracking validation plots and summary for workflow 11634.501
  • tracking validation plots and summary for workflow 11634.502

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.885502.png
zoom-136.885502.png
scan-136.885522.png
zoom-136.885522.png

logs and nvprof/nvvp profiles

/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • step3.py: log
    • profile.py: log, profile and summary are missing, see the full log for more information
    • ⚠️ cuda-memcheck --tool initcheck did not run
    • ⚠️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all did not run
    • ⚠️ cuda-memcheck --tool synccheck did not run
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 963160 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 930760 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 47408 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 44536 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 11634.5
  • development release, workflow 11634.5
  • development release, workflow 11634.501
  • development release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 11634.511
  • development release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • development release, workflow 11634.521
  • development release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 26760 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.885502
  • development release, workflow 136.885512
  • development release, workflow 136.885522
  • testing release, workflow 11634.5
  • testing release, workflow 11634.501
  • testing release, workflow 11634.502
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 11634.511
  • testing release, workflow 11634.512
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 11634.521
  • testing release, workflow 11634.522
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • cuda-memcheck --tool initcheck (report, log) found 26760 errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
  • testing release, workflow 136.885502
  • testing release, workflow 136.885512
  • testing release, workflow 136.885522

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/24a441b28c8ebc5c2ba6473426d841f664d92680/log .

@fwyzard
Copy link

fwyzard commented Jul 21, 2020

So, making progress...

There still is an issue under cuda-memcheck in the ECAL workflow:

terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
terminate called recursively
  what():  
/data/user/fwyzard/patatrack/validation/run_517.RyT4TIWegb/development/src/HeterogeneousCore/CUDAUtilities/src/CachingDeviceAllocator.h, line 603:
cudaCheck(error = cudaEventRecord(search_key.ready_event, search_key.associated_stream));
cudaErrorLaunchFailure: unspecified launch failure

However I think it makes sense to merge this PR, and look into this issue separately.

@fwyzard fwyzard merged commit 935c1bb into cms-patatrack:CMSSW_11_1_X_Patatrack Jul 21, 2020
fwyzard pushed a commit that referenced this pull request Jul 21, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard pushed a commit that referenced this pull request Oct 7, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard pushed a commit that referenced this pull request Oct 19, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard pushed a commit that referenced this pull request Nov 9, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard pushed a commit that referenced this pull request Nov 12, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard added a commit that referenced this pull request Nov 26, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard added a commit that referenced this pull request Nov 28, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard pushed a commit that referenced this pull request Dec 25, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard added a commit that referenced this pull request Dec 26, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
)

Fix memory allocation issues.

Apply come code clean up:
  - remove outdated comments;
  - replace MYMALLOC macro with a lambda;
  - reuse named values from EcalDataFrame.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix ECAL ECAL-related developments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants