-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use 10 time samples for the HBHE digis in Run 3 #531
Use 10 time samples for the HBHE digis in Run 3 #531
Conversation
@mariadalfonso @vkhristenko in which cases should we use 8 or 10 time samples ?
|
Run3 Mc - 10 samples
Data - 8 samples, for run 3 as well
Run 2 mc - 8 samples
Releases not sure...
…On Thu, 6 Aug 2020 at 00:19, Andrea Bocci ***@***.***> wrote:
@mariadalfonso <https://github.com/mariadalfonso> @vkhristenko
<https://github.com/vkhristenko> in which cases should we use 8 or 10
time samples ?
- Run 2 data ?
- Run 2 MC ? Up to / from which release ?
- Run 3 MC ? Up to / from which release ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#531 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABSFUCIO6UNVOJ6TYQFLCHLR7HLG7ANCNFSM4PVXJZYQ>
.
|
The above is correct. Note Mahi (RECO) runs always on 8 samples with 8 pulses but in Run3 in view of the 2024 ageing we need to simulate electronic effects ealier (the real detector has all the effect from -inf) Maria |
Thanks for all the details. Maria, I think the Run 3 MC samples from CMSSW 11.1.0(-pre8) already have the simulation with 10 samples, correct ? |
yes. |
Thanks ! |
I tested that adding
to the Run 3 MC workflow solves all the @mariadalfonso do you know how this is handled for data vs MC in the cpu workflow ? |
overall, they do not define a diff based on mc vs data (in mahi's producer). it is based on what sits in the data (how many time samples and where the sample of interest is). However, the Raw to Digi producer does have this conf! in cpu it is much easier to do as you do not need to know how to deal with offsets (for a particular thread...) technically speaking, this conf could be lifted further, removed, I believe. it would be taken from data then... I decided to leave this like that cause, at the very least it shows what the expectation is for gpu code... |
you mean the original cpu rawtodigi ? |
yep yep, the cpu unpacker has this in its configuration... |
In CMSSW 11_2_0_pre3, I get process.hcalDigis = cms.EDProducer("HcalRawToDigi",
ComplainEmptyData = cms.untracked.bool(False),
ElectronicsMap = cms.string(''),
ExpectedOrbitMessageTime = cms.untracked.int32(-1),
FEDs = cms.untracked.vint32(),
FilterDataQuality = cms.bool(True),
HcalFirstFED = cms.untracked.int32(700),
InputLabel = cms.InputTag("rawDataCollector"),
UnpackCalib = cms.untracked.bool(True),
UnpackTTP = cms.untracked.bool(True),
UnpackUMNio = cms.untracked.bool(True),
UnpackZDC = cms.untracked.bool(True),
UnpackerMode = cms.untracked.int32(0),
firstSample = cms.int32(0),
lastSample = cms.int32(9),
mightGet = cms.optional.untracked.vstring,
saveQIE10DataNSamples = cms.untracked.vint32(),
saveQIE10DataTags = cms.untracked.vstring(),
saveQIE11DataNSamples = cms.untracked.vint32(),
saveQIE11DataTags = cms.untracked.vstring(),
silent = cms.untracked.bool(True)
) for
If the relevant part is firstSample = cms.int32(0),
lastSample = cms.int32(9), does it mean that we should use 10 samples for all data and MC, also for Run 2 ? |
this must be configured somewhere else... what matters above is actually this: saveQIE10DataNSamples = cms.untracked.vint32(),
saveQIE10DataTags = cms.untracked.vstring(),
saveQIE11DataNSamples = cms.untracked.vint32(),
saveQIE11DataTags = cms.untracked.vstring(), |
Before the 2018 let's forget. --> the MC extended the presamples only for the Run3 --> DATA stay at 8 likely because of the bandwidth of things out of P5. All (SOI,nFrame) should be transparent in RECO because the Frame contains the SOI/NSamples (no matter what is MC or DATA). |
those are already the fully expanded dumps for the
OK, for the time being I'll check if this works: # Only for Run 3 MC; Run 2 MC and Run 2/3 data use 8 time samples
from Configuration.Eras.Modifier_run3_HB_cff import run3_HB
run3_HB.toModify(hcalDigisGPU,
nsamplesF01HE = 10,
nsamplesF3HB = 10,
nsamplesF5HB = 10
) (I think |
https://github.com/cms-sw/cmssw/blob/master/EventFilter/HcalRawToDigi/plugins/HcalRawToDigi.cc#L117 this is used here and then during decoding... i'm not sure about the confs... but when debugging all that stuff this is what i have found... in other words, you have to match what sits in data with your expectations... |
@vkhristenko in your GPU code, do you use the number of time samples only to allocate enough memory ? |
nope, this sets the stride for the access actually and should be as expected. as I said above in principle this can be lifted... but the easiest for me was to add this configurable things... |
This comment has been minimized.
This comment has been minimized.
The test should be fine, since it should use Run 2 settings for running on data and Run 3 settings for MC... still, I'll ask HCAL how this is handled in the current code to figure out the best way to set the parameters. @vkhristenko of course if (later on) you can make the code detect the correct number from the data itself, even better :-) |
This is what should be supported at DIGI In the above cases for RECO, we do the fit on redefined -- 8TS and the SOI is always at TS3 |
Let’s do this, I’ll update this ASAP and close the topic
…On Thu, 6 Aug 2020 at 16:03, mariadalfonso ***@***.***> wrote:
This is what should be supported at DIGI
// maxTS = 10, SOI = 5 (new, for better modeling of SiPM noise in MC) Run3
MC
// maxTS = 8, SOI = 3 (typical 8-TS situation in data). Run3 data
// maxTS = 8, SOI = 3 Run2018 data/MC
In the above cases for RECO, we do the fit on redefined -- 8TS and the SOI
is always at TS3
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#531 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABSFUCOZGYVDYNEQ7IPNOTLR7KZ2RANCNFSM4PVXJZYQ>
.
|
@vkhristenko this PR as it is fixes the issue on Run 3 MC (with 10 time samples), and looks fine on data (which still uses 8 time samples). |
ok, never thought that would make such a huge diff... let me fix that now, apologies... |
IIRC |
diff --git a/EventFilter/HcalRawToDigi/plugins/HcalDigisProducerGPU.cc b/EventFilter/HcalRawToDigi/plugins/HcalDigisProducerGPU.cc
index 2f79bbd7867e..f64200a40495 100644
--- a/EventFilter/HcalRawToDigi/plugins/HcalDigisProducerGPU.cc
+++ b/EventFilter/HcalRawToDigi/plugins/HcalDigisProducerGPU.cc
@@ -15,7 +15,7 @@
class HcalDigisProducerGPU : public edm::stream::EDProducer<edm::ExternalWork> {
public:
explicit HcalDigisProducerGPU(edm::ParameterSet const& ps);
- ~HcalDigisProducerGPU() override;
+ ~HcalDigisProducerGPU() override = default;
static void fillDescriptions(edm::ConfigurationDescriptions&);
private:
@@ -53,6 +53,11 @@ private:
};
ConfigParameters config_;
+ // temporary host buffers
+ HostCollectionf01 hf01_;
+ HostCollectionf5 hf5_;
+ HostCollectionf3 hf3_;
+
// device products: product owns memory (i.e. not the module)
DeviceCollectionf01 df01_;
DeviceCollectionf5 df5_;
@@ -84,9 +89,12 @@ HcalDigisProducerGPU::HcalDigisProducerGPU(const edm::ParameterSet& ps)
config_.maxChannelsF01HE = ps.getParameter<uint32_t>("maxChannelsF01HE");
config_.maxChannelsF5HB = ps.getParameter<uint32_t>("maxChannelsF5HB");
config_.maxChannelsF3HB = ps.getParameter<uint32_t>("maxChannelsF3HB");
-}
-HcalDigisProducerGPU::~HcalDigisProducerGPU() {}
+ // preallocate the host buffers
+ hf01_.reserve(config_.maxChannelsF01HE);
+ hf5_.reserve(config_.maxChannelsF5HB);
+ hf3_.reserve(config_.maxChannelsF3HB);
+}
void HcalDigisProducerGPU::acquire(edm::Event const& event,
edm::EventSetup const& setup,
@@ -94,6 +102,10 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
// raii
cms::cuda::ScopedContextAcquire ctx{event.streamID(), std::move(holder), cudaState_};
+ // clear the host buffers
+ hf01_.clear();
+ hf5_.clear();
+ hf3_.clear();
// event data
edm::Handle<HBHEDigiCollection> hbheDigis;
@@ -101,17 +113,11 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
event.getByToken(hbheDigiToken_, hbheDigis);
event.getByToken(qie11DigiToken_, qie11Digis);
- // tmp host data
- HostCollectionf01 hf01;
- HostCollectionf5 hf5;
- HostCollectionf3 hf3;
-
// init f5 collection
if (hbheDigis->size() > 0) {
auto const nsamples = (*hbheDigis)[0].size();
auto const stride = hcal::compute_stride<hcal::Flavor5>(nsamples);
- hf5.stride = stride;
- hf5.reserve(config_.maxChannelsF5HB);
+ hf5_.stride = stride;
// flavor5 get device blobs
df5_.stride = stride;
@@ -126,10 +132,8 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
auto const stride01 = hcal::compute_stride<hcal::Flavor01>(nsamples);
auto const stride3 = hcal::compute_stride<hcal::Flavor3>(nsamples);
- hf01.stride = stride01;
- hf3.stride = stride3;
- hf01.reserve(config_.maxChannelsF01HE);
- hf3.reserve(config_.maxChannelsF3HB);
+ hf01_.stride = stride01;
+ hf3_.stride = stride3;
// flavor 0/1 get devie blobs
df01_.stride = stride01;
@@ -147,19 +151,19 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
for (auto const& hbhe : *hbheDigis) {
auto const id = hbhe.id().rawId();
auto const presamples = hbhe.presamples();
- hf5.ids.push_back(id);
- hf5.npresamples.push_back(presamples);
+ hf5_.ids.push_back(id);
+ hf5_.npresamples.push_back(presamples);
auto const stride = hcal::compute_stride<hcal::Flavor5>(hbhe.size());
- assert(stride == hf5.stride && "strides must be the same for every single digi of the collection");
+ assert(stride == hf5_.stride && "strides must be the same for every single digi of the collection");
// simple for now...
static_assert(hcal::Flavor5::HEADER_WORDS == 1);
uint16_t header_word = (1 << 15) | (0x5 << 12) | (0 << 10) | ((hbhe.sample(0).capid() & 0x3) << 8);
- hf5.data.push_back(header_word);
+ hf5_.data.push_back(header_word);
for (unsigned int i = 0; i < stride - hcal::Flavor5::HEADER_WORDS; i++) {
uint16_t s0 = (0 << 7) | (static_cast<uint8_t>(hbhe.sample(2 * i).adc()) & 0x7f);
uint16_t s1 = (0 << 7) | (static_cast<uint8_t>(hbhe.sample(2 * i + 1).adc()) & 0x7f);
uint16_t sample = (s1 << 8) | s0;
- hf5.data.push_back(sample);
+ hf5_.data.push_back(sample);
}
}
@@ -170,21 +174,21 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
if (digi.detid().subdetId() != HcalEndcap)
continue;
auto const id = digi.detid().rawId();
- hf01.ids.push_back(id);
+ hf01_.ids.push_back(id);
for (int hw = 0; hw < hcal::Flavor01::HEADER_WORDS; hw++)
- hf01.data.push_back((*qie11Digis)[i][hw]);
+ hf01_.data.push_back((*qie11Digis)[i][hw]);
for (int sample = 0; sample < digi.samples(); sample++) {
- hf01.data.push_back((*qie11Digis)[i][hcal::Flavor01::HEADER_WORDS + sample]);
+ hf01_.data.push_back((*qie11Digis)[i][hcal::Flavor01::HEADER_WORDS + sample]);
}
} else if (digi.flavor() == 3) {
if (digi.detid().subdetId() != HcalBarrel)
continue;
auto const id = digi.detid().rawId();
- hf3.ids.push_back(id);
+ hf3_.ids.push_back(id);
for (int hw = 0; hw < hcal::Flavor3::HEADER_WORDS; hw++)
- hf3.data.push_back((*qie11Digis)[i][hw]);
+ hf3_.data.push_back((*qie11Digis)[i][hw]);
for (int sample = 0; sample < digi.samples(); sample++) {
- hf3.data.push_back((*qie11Digis)[i][hcal::Flavor3::HEADER_WORDS + sample]);
+ hf3_.data.push_back((*qie11Digis)[i][hcal::Flavor3::HEADER_WORDS + sample]);
}
}
}
@@ -198,19 +202,19 @@ void HcalDigisProducerGPU::acquire(edm::Event const& event,
cudaCheck(cudaMemcpyAsync(dest, src.data(), src.size() * sizeof(type), cudaMemcpyHostToDevice, ctx.stream()));
};
- lambdaToTransfer(df01_.data.get(), hf01.data);
- lambdaToTransfer(df01_.ids.get(), hf01.ids);
+ lambdaToTransfer(df01_.data.get(), hf01_.data);
+ lambdaToTransfer(df01_.ids.get(), hf01_.ids);
- lambdaToTransfer(df5_.data.get(), hf5.data);
- lambdaToTransfer(df5_.ids.get(), hf5.ids);
- lambdaToTransfer(df5_.npresamples.get(), hf5.npresamples);
+ lambdaToTransfer(df5_.data.get(), hf5_.data);
+ lambdaToTransfer(df5_.ids.get(), hf5_.ids);
+ lambdaToTransfer(df5_.npresamples.get(), hf5_.npresamples);
- lambdaToTransfer(df3_.data.get(), hf3.data);
- lambdaToTransfer(df3_.ids.get(), hf3.ids);
+ lambdaToTransfer(df3_.data.get(), hf3_.data);
+ lambdaToTransfer(df3_.ids.get(), hf3_.ids);
- df01_.size = hf01.ids.size();
- df5_.size = hf5.ids.size();
- df3_.size = hf3.ids.size();
+ df01_.size = hf01_.ids.size();
+ df5_.size = hf5_.ids.size();
+ df3_.size = hf3_.ids.size();
}
void HcalDigisProducerGPU::produce(edm::Event& event, edm::EventSetup const& setup) { |
added, the only main diff wrt @fwyzard 's diff is the presence of stride setting before reserve call.. for the max statically known... let me know! |
Validation summaryReference release CMSSW_11_1_0 at b7ad279 Validation plots/RelValTTbar_14TeV/CMSSW_11_1_0_pre8-PU_111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW/RelValZMM_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAW/RelValZEE_14/CMSSW_11_1_0_pre8-111X_mcRun3_2021_realistic_v4-v1/GEN-SIM-DIGI-RAWThroughput plots/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53logs and
|
CUDA is still (again ?) not happy :-( |
U mean numbers? Or cuda men check?
…On Sat, 8 Aug 2020 at 02:10, Andrea Bocci ***@***.***> wrote:
CUDA is still (again ?) not happy :-(
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#531 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABSFUCLZ4PYOBQ7JUSO5ATDR7SJWPANCNFSM4PVXJZYQ>
.
|
The last round of validation spotted problem with Just check the ❌ under "testing release". |
I'm not seeing these issues (--tool initcheck), i'm running on ttbar... |
Running step3.py on TTbar, I can reproduce the crash with the cuda-memcheck --tool synccheck --log-file synccheck.log -- cmsRun step3.py
...
08-Aug-2020 17:10:38 CEST Initiating request to open file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
08-Aug-2020 17:10:39 CEST Successfully opened file file:/gpu_data/store/relval/CMSSW_11_1_0_pre8/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_111X_mcRun3_2021_realistic_v4-v1/20000/6767846A-04AA-AD40-BDAB-407450210E53.root
Begin processing the 1st record. Run 1, Event 6107, LumiSection 62 on stream 6 at 08-Aug-2020 17:10:53.325 CEST
Begin processing the 2nd record. Run 1, Event 6101, LumiSection 62 on stream 1 at 08-Aug-2020 17:10:53.338 CEST
Begin processing the 3rd record. Run 1, Event 6106, LumiSection 62 on stream 0 at 08-Aug-2020 17:10:53.338 CEST
Begin processing the 4th record. Run 1, Event 6104, LumiSection 62 on stream 4 at 08-Aug-2020 17:10:53.340 CEST
Begin processing the 5th record. Run 1, Event 6102, LumiSection 62 on stream 3 at 08-Aug-2020 17:10:53.341 CEST
Begin processing the 6th record. Run 1, Event 6108, LumiSection 62 on stream 7 at 08-Aug-2020 17:10:53.342 CEST
Begin processing the 7th record. Run 1, Event 6103, LumiSection 62 on stream 5 at 08-Aug-2020 17:10:53.342 CEST
Begin processing the 8th record. Run 1, Event 6105, LumiSection 62 on stream 2 at 08-Aug-2020 17:10:53.343 CEST
Begin processing the 9th record. Run 1, Event 6115, LumiSection 62 on stream 7 at 08-Aug-2020 17:10:53.980 CEST
terminate called after throwing an instance of 'cms::cuda::bad_alloc'
what(): an illegal memory access was encountered
A fatal system signal has occurred: abort signal
The following is the call stack containing the origin of the signal. However, this might be a bug in the tool rather than a problem in our code.
and
Unfortunately the version bundled with CUDA in CMSSW is incomplete, I'll have to fix that before we can switch to in the validation tests. |
As for the Otherwise, the report is
|
Merged in CMSSW_11_1_X and CMSSW_11_2_X . |
PR description:
fixing issue #520.
I attach a cfg that I used to validate. In particular confs should be added for the gpu modifier (apologies, not sure where to add this)
cfg.txt
PR validation:
cuda-memcheck does not show invalids anymore (memcheck/initcheck).