Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MixingModule thread-safety issue #21972

Open
fabiocos opened this issue Jan 26, 2018 · 12 comments
Open

MixingModule thread-safety issue #21972

fabiocos opened this issue Jan 26, 2018 · 12 comments

Comments

@fabiocos
Copy link
Contributor

A thread-safety problem has been discovered in the MixingModule code.

Here is the diagnosis and suggested solution by C. Jones (@Dr15Jones from your message):

BMixingModule is a stream module, but it is using a GlobalCache

class BMixingModule : public stream::EDProducer<GlobalCache<MixingCache::Config>>

which allows sharing of the 'const' cache objects between the streams. However, the implementation in this class appears to have falling in the 'shallow const' problem of C++.

The constructor of BMixingModule is
explicit BMixingModule(const edm::ParameterSet& ps, MixingCache::Config const* globalConf);

and as you can see, it only has access to a 'const' MixingCache::Config. However, MixingCache::Config is a struct
https://github.com/cms-sw/cmssw/blob/master/Mixing/Base/interface/BMixingModule.h#L36

which has a std::vector<std::shared_ptr> as member data. Now instances of BMixingModule can not add or remove items from the std::vector<>, since it is const, and can not reassign the std::shared_ptr<> in the vector (since they are const) but they CAN call non-const functions on PileUpConfig since a 'const' std::shared_ptr does not make what it points to const. The struct really should have used a std::vector<std::shared_ptr< const PileUpConfig>> instead. Unfortunately, event that change is insufficient. PileUpConfig
https://github.com/cms-sw/cmssw/blob/master/Mixing/Base/interface/PileUp.h#L31
holds a std::shared_ptr so even given a const PileUpConfig one could still call a non-const function of the TH1F. So here, too, the member data should have been std::shared_ptr.

If both of those changes had been made, the compiler would not have allowed the call to TH1F::ComputeIntegral to happen.

A simple fix would be to change PileUp::PileUp so that instead of doing
histo_(config->histo_),
it did
histo_(std::make_shared_ptr(*config->histo_)),

which would create a copy.

@civanch @mdhildreth we need to implement a fix in the code

@cmsbuild
Copy link
Contributor

A new Issue was created by @fabiocos Fabio Cossutti.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@fabiocos
Copy link
Contributor Author

assign simulation

@cmsbuild
Copy link
Contributor

New categories assigned: simulation

@mdhildreth,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

@fwyzard
Copy link
Contributor

fwyzard commented Jan 30, 2018

glad to see DQM is not the only part of CMSSW being bitten by non-thread-safe ROOT histograms...

@fabiocos
Copy link
Contributor Author

#22000 implements the simplest solution by Chris, but more suggestions are provided there. We also urgently need to back-port this fix to 94X in order to be prepared for the production of a new premixing library.

@civanch
Copy link
Contributor

civanch commented Jan 30, 2018

@fabiocos , #22040 PR is proposed for testing
Concerning proper thread safe upgrade: I would follow @davidlange6 proposal to remove TF1 from the game, because what is needed is basically manipulation with an array nothing more.

@mdhildreth
Copy link
Contributor

I agree!

@dan131riley
Copy link

+1 to removing the TF1, I really should have done that when I fixed the random number generator

@fabiocos
Copy link
Contributor Author

fabiocos commented Feb 1, 2018

the simple fix has been integrated in 9_4_X, 10_0_X and 10_1_X . I keep the issue open as a reminder for the more general fix discussed, until it is proposed.

@fabiocos
Copy link
Contributor Author

@makortel @civanch @silviodonato @qliphy this old issue seems to me still open, please confirm

@makortel
Copy link
Contributor

Looks still relevant to me too.

@dan131riley
Copy link

Yes, still relevant. I put it onto my todo list and will make an entry for the agile board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants