LowPtElectrons: updated BDT model and code base for ID #31080

bainbrid · 2020-08-06T19:07:25Z

This PR concerns changes to the BDT model used for the identification of LowPtElectrons.

This PR requires the following external: cms-data/RecoEgamma-ElectronIdentification#14

The LowPtGsfElectronIDProducer.cc class has been adapted to handle changes to the list of features used by the new model in LowPtElectrons: updated BDT model for ID cms-data/RecoEgamma-ElectronIdentification#14
We have reverse engineered some recent changes made to this package in the CMSSW_11_0_X cycle. The changes involve moving functionality back to the interface and src directories. Utility methods defined in LowPtGsfElectronFeatures.h guarantee the correct extraction of features (from both AOD and MINIAOD data tiers) for both evaluation (in CMSSW and outside) and training purposes (using workflows defined outside the cms-sw repo).
These changes require the following external: LowPtElectrons: updated BDT model for ID cms-data/RecoEgamma-ElectronIdentification#14, which contains a "placeholder" model for the time being. The model weights will be updated again in a future PR once MC samples are available (in progress) for a final retraining.

Validation has been performed with the "placeholder" model and the performance is consistent with our expectations.

… CMSBParking with cms-merge-topic

…cms-merge-topic

cmsbuild · 2020-08-06T19:07:46Z

The code-checks are being triggered in jenkins.

cmsbuild · 2020-08-06T19:12:47Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31080/17619

This PR adds an extra 32KB to repository

cmsbuild · 2020-08-06T19:13:06Z

A new Pull Request was created by @bainbrid for master.

It involves the following packages:

RecoEgamma/EgammaElectronProducers

@perrotta, @jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@Sam-Harper, @jainshilpi, @rovere, @lgray, @sobhatta, @lecriste, @afiqaize, @varuns23 this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

bainbrid · 2020-08-06T19:16:07Z

@crovelli

bainbrid · 2020-08-06T20:07:47Z

@cavalari

slava77 · 2020-08-06T23:19:34Z

test parameters:

pull_request = LowPtElectrons: updated BDT model for ID cms-data/RecoEgamma-ElectronIdentification#14

slava77 · 2020-08-06T23:20:12Z

@cmsbuild please test

cmsbuild · 2020-08-06T23:20:38Z

The tests are being triggered in jenkins.
Tested with other pull request(s) cms-data/RecoEgamma-ElectronIdentification#14

CMSSW_11_2_X_2020-08-06-1100/slc7_amd64_gcc820: https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/8642/console Started: 2020/08/07 02:15

cmsbuild · 2020-08-07T11:28:11Z

+1
Tested at: d553c18
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9c4a8a/8642/summary.html
CMSSW: CMSSW_11_2_X_2020-08-06-1100
SCRAM_ARCH: slc7_amd64_gcc820

bainbrid · 2020-08-14T08:16:59Z

@slava77 @afiqaize @SohamBhattacharya

Looking at this, I do not see where lowPtGsfElectronID is used; a module simply inserted in the task but which products are not consumed will not run.

Ah, true, my mistake - it is used using a standalone test we have privately, which fooled me.

My suggestion was to add it to the list of the electron IDs/userFloats in the slimmedElectrons.

I'm not sure how to do this. If we add a userFloat as you suggest, then everything else in this PR can remain the same?

OTOH, the main reason why I asked about this was because I thought that lowPtGsfElectronID is not already running (#31080 (comment) says " we have yet to schedule the module to run the new ID model in this PR." ). From the reco comparisons e.g. in https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_11_2_X_2020-08-06-1100+9c4a8a/38242/validateJR/all_OldVSNew_RunSinglePh2016Bwf136p731/ I see that this module is already running in default reco.

Yes, this is true, but this is part of the RECO sequence, while we wish to execute the ID as part of the PAT sequence. So the reRECO test you refer to can bre used to validate the performance within the RECO sequence (once we replace the placeholder model), but - most importantly - we wish to test the ID as part of the PAT sequence using MINIAOD inputs (i.e. same as for the training).

Am I correct?

jpata · 2020-08-14T08:36:46Z

@bainbrid if you plan to run it in PAT but don't have the final model yet, I think it would be useful to already introduce the relevant python conf as suggested above to the workflows, so everything can be tested end to end.

Perhaps we can get a feeling for the timing as well, even though the model is a placeholder.

bainbrid · 2020-08-14T08:51:53Z

Hi @jpata. I’ve linked to an example PAT config in a post above, found [here](https://github.com/cms-sw/cmssw/compare/14635b1..355e62b). This is my “best guess” for an implementation. The issue is that no module then calls the ID producer (my mistake) so it’s never executed. @slava77 suggests adding to userFloats but I don’t have any experience of this. If you can point me to an example, I can attempt to add it.

…

On 14 Aug 2020, at 09:37, Joosep Pata ***@***.***> wrote: @bainbrid if you plan to run it in PAT but don't have the final model yet, I think it would be useful to already introduce the relevant python conf as suggested above to the workflows, so everything can be tested end to end. Perhaps we can get a feeling for the timing as well, even though the model is a placeholder. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jpata · 2020-08-14T09:08:02Z

How about consuming it here: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/PatAlgos/python/slimming/slimmedLowPtElectrons_cfi.py

slava77 · 2020-08-14T12:30:08Z

My suggestion was to add it to the list of the electron IDs/userFloats in the slimmedElectrons.

I'm not sure how to do this. If we add a userFloat as you suggest, then everything else in this PR can remain the same?

The configuration logic for electron modifiers starts around

cmssw/PhysicsTools/PatAlgos/python/slimming/miniAOD_tools.py

Line 272 in ed643cb

process.slimmedElectrons.modifierConfig.modifications = egamma_modifications

This would include both VID and more direct insertions in RecoEgamma/EgammaTools/python/egammaObjectModificationsInMiniAOD_cff.py
egamma experts may be more specific with a pointer.
@jainshilpi @lsoffi @afiqaize @SohamBhattacharya

slava77 · 2020-08-14T12:35:01Z

My suggestion was to add it to the list of the electron IDs/userFloats in the slimmedElectrons.

I'm not sure how to do this. If we add a userFloat as you suggest, then everything else in this PR can remain the same?

The configuration logic for electron modifiers starts around

cmssw/PhysicsTools/PatAlgos/python/slimming/miniAOD_tools.py

Line 272 in ed643cb

process.slimmedElectrons.modifierConfig.modifications = egamma_modifications

This would include both VID and more direct insertions in RecoEgamma/EgammaTools/python/egammaObjectModificationsInMiniAOD_cff.py
egamma experts may be more specific with a pointer.
@jainshilpi @lsoffi @afiqaize @SohamBhattacharya

It looks like my pointers are more specific to the standard slimmedElectrons
slimmedLowPtElectrons_cfi.py seems more appropriate in this context

bainbrid · 2020-08-18T09:13:11Z

@jainshilpi @lsoffi @afiqaize @SohamBhattacharya Is somebody available to take a look at this? I suspect it's straightforward for an expert to figure out how to implement this, but would take me some time to understand the MINIAOD tools. (I'm happy to implement if somebody can give pointers.) In short, our IDProducer (if called by something!) takes the slimmedLowPtElectrons collection as input and produces a ValueMap of floats, keyed off the slimmed collection. See my attempt (so far) here.

jpata · 2020-08-18T09:37:08Z

@bainbrid the main consideration right now is that the new code is included in the jenkins tests. As has been pointed out by Slava, looks like the LowPtGsfElectronIDProducer is included in the RECO sequences [1] and gives some differences with respect to the baseline [2].
We can move forward with this from RECO if we understand that these differences are acceptable. Can you confirm? If the code for testing it in PAT is not immediately available, this can come later, too.

[1] #31080 (comment)
[2] https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_11_2_X_2020-08-13-1100+9c4a8a/38323/validateJR/all_OldVSNew_TTbarwf25p0/all_OldVSNew_TTbarwf25p0c_floatedmValueMap_lowPtGsfElectronID__RECO_obj_values_.png

bainbrid · 2020-08-18T09:50:50Z

@jpata @slava77 In Slava's comment, I followed the links to this comparison. It seems a little different to the one you point me too - perhaps just higher stats? (I will comment on this distribution.)

I can say the following: the new distribution is shifted to lower values, which would be bad for signal and good for background. It's not clear to me if we are looking at signal or bkgd or both (presumably a mix, most likely mainly bkgd), as it depends on the MC sample used for the test.

However, the new code we are testing here is likely to be incompatible with the model weights stored by default in cms-data, so a shift to lower values is probably my expectation.

If you also included cms-data/RecoEgamma-ElectronIdentification#14 in this test (please can you confirm?), then the weights would be compatible with the new code. For this scenario, I cannot say with certainty what I would expect to see, on balance probably not a shift to lower values.

In summary: a change is highly likely, a shift to lower values is quite possible, and the values don't seem crazy.

jpata · 2020-08-18T10:12:01Z

Thanks @bainbrid.

The test was indeed done with cms-data/RecoEgamma-ElectronIdentification#14 as can be checked in #31080 (comment).

The comparison Slava posted earlier in #31080 (comment) is with RunSinglePh2016B, therefore higher stats for low-pt electrons.

My understanding is then:

all the new code you added is evaluated in the tests
it changes the output in an expected, non-final way
the final model is in development, as are the python sequences to add it to PAT.

Did I get it right?

jainshilpi · 2020-08-18T10:12:59Z

@jainshilpi @lsoffi @afiqaize @SohamBhattacharya Is somebody available to take a look at this? I suspect it's straightforward for an expert to figure out how to implement this, but would take me some time to understand the MINIAOD tools. (I'm happy to implement if somebody can give pointers.) In short, our IDProducer (if called by something!) takes the slimmedLowPtElectrons collection as input and produces a ValueMap of floats, keyed off the slimmed collection. See my attempt (so far) here.

@bainbrid apologies for missing this. We need to check with our ID and reco experts. will get back to you offline on this.

bainbrid · 2020-08-18T10:17:23Z

My understanding is then:

all the new code you added is evaluated in the tests

@jpata

Yes.

it changes the output in an expected, non-final way

Yes, changes are expected.

the final model is in development, as are the python sequences to add it to PAT.

Yes.

Did I get it right?

Yes, thanks!

jpata · 2020-08-18T10:20:08Z

Thanks! Can I ask the EGamma contacts who were already pinged @afiqaize @SohamBhattacharya to confirm that this change to the BDT output is fine in the mean time?

SohamBhattacharya · 2020-08-18T10:28:45Z

Hi, sorry for missing this. I'm checking and will get back asap.

…

On Tue, 18 Aug, 2020, 3:50 PM Joosep Pata, ***@***.***> wrote: Thanks! Can I ask the EGamma contacts who were already pinged @afiqaize <https://github.com/afiqaize> @SohamBhattacharya <https://github.com/SohamBhattacharya> to confirm that this change to the BDT output is fine in the mean time? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#31080 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUSNWSAOJ6TMJWQ6WBORWLSBJIWTANCNFSM4PW4LEPA> .

bainbrid · 2020-08-18T10:31:00Z

Thanks! Can I ask the EGamma contacts who were already pinged @afiqaize @SohamBhattacharya to confirm that this change to the BDT output is fine in the mean time?

@jpata - I'm afraid I'm probably best placed to comment on the BDT output ... but I'm afraid I cannot give a definitive answer.

The reason in short is that the reference histogram is using a rather suboptimal model and so is a poor choice for reference (but the only one we have). Naviely, I'd expect to do better than this.

However, the model we are testing is only a placeholder and so - without quite some digging - I cannot be sure if it should give better performance than the (suboptimal) default or not.

Further, the distributions are an (unknown) mixture of signal and background, so this complicates the interpretation.

Finally, I'm happy to move forward. The new distribution appears reasonably healthy in that it is smooth and there are no unusual spikes (e.g. at 0 or 1).

So green light from me. We'll know more when we have the PAT sequence and the new model.

jpata · 2020-08-18T10:33:21Z

Thanks @bainbrid. As you confirm it for EGamma, we can approve this PR from reco. I just want to make sure that nothing else depends on the BDT output in the mean time.

jpata · 2020-08-18T11:48:07Z

+1

tests pass with changes to the lowPtGsfElectronID output as expected using a placeholder model (confirmed by the author for EGamma)
updates the LowPtGsfElectronIDProducer and corresponding supporting code
model with final retraining and PAT sequences will come in a later PR

SohamBhattacharya · 2020-08-18T14:38:25Z

Afiq and I confirm that [1] is fine as simply modifying the standard sequence to produce a new collection would be the easiest way to do this.
Since @bainbrid has already greenlit the BDT output, we're okay with these changes.

[1] https://github.com/cms-sw/cmssw/blob/355e62b7237b9e6d054c7d0fc64e1c01cc062ad4/PhysicsTools/PatAlgos/python/slimming/slimmedLowPtElectrons_cff.py

Again, sorry for the delay.

cmsbuild · 2020-08-18T15:49:55Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

qliphy · 2020-08-19T01:47:30Z

+1

crovelli and others added 8 commits July 16, 2020 08:24

Extra variables for LowPtElectrons Id added

3ce8e61

Merged develLowPtEleId from repository crovelli with cms-merge-topic

39c503e

reverse-engineer move of APIs for model features

c8a9fc8

Merged from-CMSSW_11_2_X-refactorise-api-for-features from repository…

7f56ba3

… CMSBParking with cms-merge-topic

float-int fixes; e_sc range change

623d3f6

Merged from-CMSSW_11_2_X__updatedLPTid from repository crovelli with …

595a6cf

…cms-merge-topic

use edm::View and edm::Ptr to allow evaluation using patElectrons

b7fa8e8

scram build code-checks, scram build code-format

d553c18

bainbrid mentioned this pull request Aug 6, 2020

LowPtElectrons: updated BDT model for ID cms-data/RecoEgamma-ElectronIdentification#14

Merged

cmsbuild added this to the CMSSW_11_2_X milestone Aug 6, 2020

cmsbuild added code-checks-pending comparison-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Aug 6, 2020

cmsbuild added code-checks-approved and removed code-checks-pending labels Aug 6, 2020

cmsbuild added requires-external tests-started and removed tests-pending labels Aug 6, 2020

cmsbuild added fully-signed reconstruction-approved and removed pending-signatures reconstruction-pending labels Aug 18, 2020

cmsbuild added orp-approved and removed orp-pending labels Aug 19, 2020

cmsbuild merged commit 6d1ee32 into cms-sw:master Aug 19, 2020

jainshilpi mentioned this pull request Aug 21, 2020

produce lowPtGsfEle slimmed collections for the default miniAOD #25884

Closed

bainbrid mentioned this pull request Aug 24, 2020

Add low-pT electrons to MINIAOD, update ID, improve end user experience #31220

Merged

bainbrid mentioned this pull request Dec 2, 2020

Add low-pT electrons to MINIAOD, update ID, improve end user experience (back port of 31220) #32372

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LowPtElectrons: updated BDT model and code base for ID #31080

LowPtElectrons: updated BDT model and code base for ID #31080

bainbrid commented Aug 6, 2020

cmsbuild commented Aug 6, 2020

cmsbuild commented Aug 6, 2020

cmsbuild commented Aug 6, 2020

bainbrid commented Aug 6, 2020

bainbrid commented Aug 6, 2020

slava77 commented Aug 6, 2020

slava77 commented Aug 6, 2020

cmsbuild commented Aug 6, 2020 •

edited

Loading

cmsbuild commented Aug 7, 2020

bainbrid commented Aug 14, 2020

jpata commented Aug 14, 2020

bainbrid commented Aug 14, 2020 via email •

edited

Loading

jpata commented Aug 14, 2020

slava77 commented Aug 14, 2020

slava77 commented Aug 14, 2020

bainbrid commented Aug 18, 2020

jpata commented Aug 18, 2020

bainbrid commented Aug 18, 2020

jpata commented Aug 18, 2020

jainshilpi commented Aug 18, 2020

bainbrid commented Aug 18, 2020

jpata commented Aug 18, 2020

SohamBhattacharya commented Aug 18, 2020 via email

bainbrid commented Aug 18, 2020

jpata commented Aug 18, 2020

jpata commented Aug 18, 2020

SohamBhattacharya commented Aug 18, 2020

cmsbuild commented Aug 18, 2020

qliphy commented Aug 19, 2020

LowPtElectrons: updated BDT model and code base for ID #31080

LowPtElectrons: updated BDT model and code base for ID #31080

Conversation

bainbrid commented Aug 6, 2020

cmsbuild commented Aug 6, 2020

cmsbuild commented Aug 6, 2020

cmsbuild commented Aug 6, 2020

bainbrid commented Aug 6, 2020

bainbrid commented Aug 6, 2020

slava77 commented Aug 6, 2020

slava77 commented Aug 6, 2020

cmsbuild commented Aug 6, 2020 • edited Loading

cmsbuild commented Aug 7, 2020

bainbrid commented Aug 14, 2020

jpata commented Aug 14, 2020

bainbrid commented Aug 14, 2020 via email • edited Loading

jpata commented Aug 14, 2020

slava77 commented Aug 14, 2020

slava77 commented Aug 14, 2020

bainbrid commented Aug 18, 2020

jpata commented Aug 18, 2020

bainbrid commented Aug 18, 2020

jpata commented Aug 18, 2020

jainshilpi commented Aug 18, 2020

bainbrid commented Aug 18, 2020

jpata commented Aug 18, 2020

SohamBhattacharya commented Aug 18, 2020 via email

bainbrid commented Aug 18, 2020

jpata commented Aug 18, 2020

jpata commented Aug 18, 2020

SohamBhattacharya commented Aug 18, 2020

cmsbuild commented Aug 18, 2020

qliphy commented Aug 19, 2020

cmsbuild commented Aug 6, 2020 •

edited

Loading

bainbrid commented Aug 14, 2020 via email •

edited

Loading