Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge from jpata/particleflow master #2

Merged
merged 26 commits into from
Jul 23, 2021
Merged

Conversation

erwulff
Copy link
Owner

@erwulff erwulff commented Jul 23, 2021

No description provided.

erwulff and others added 26 commits June 24, 2021 22:23
This commit also includes
 - Custom tensorboard callback logging learning rate & momentum
 - A utils.py file collecting utilities used in more than one file
 - Clean-up of how output files are organized
 - Configuration files using the OneCycle scheduler
`mlpf/pipeline.py` is the beginning of a `click` based alternative to the
`mlpf/launcher.py`.
Also add option to give a prefix to the name of the training
directory
Also add lr_schedule parameter to configuration files
The previous commit still scaled the LR, this one fixes it.
- create get_train_val_datasets() function to get datasets for training
- move targets_multi_output() from model_setup.py to utils.py for more
  flexible access (solving import loop issue)
The learning rate finder implements a technique to easily estimate
a range of learning rates that should perform well given the
current model setup. When the model architecture or other
hyperparameters are changed, the learning rate finder can be run in
order to find a new suitable LR range.

The learning rate finder starts training the model at a very low LR,
increasing it every batch. The batch loss is plotted vs training
steps and a figure is created from which a suitable LR range can be
determined.

This technique was first introduced by Leslie Smith in
https://arxiv.org/abs/1506.01186.
When running `python mlpf/pipeline.py evaluate -t <train_dir>` without
specifying which weights to use explicitly the pipeline will load the
weights with the smallest loss in <train_dir>/weights/ that it can find.
This can be useful when many large checkpoint files take up too much storage space.
The default parameters for expdecay added to the config files
in this commit are the same as those used on the
jpata/particleflow master branch at the time of writing.
Also:
- Add missing parameters to config files.
- Move make_weights_function to utils.py
OneCycle LR, LR finder, custom Tensorboard, etc.
@erwulff erwulff merged commit 12fa88d into erwulff:master Jul 23, 2021
erwulff added a commit that referenced this pull request Sep 2, 2022
* Initial commit

* add template dataset definitions

* Add initial CMS particle-flow dataset implementation

Also changed to a new tensorflow dataset template

* add test scripts

* Run black formatting on python files

* Add instructions to cms_pf, use manual_dir for preprocessing

* fix: ability to choose data directory for the tfrecords files

* feat: Add Delphes dataset

* fix: support loading both .pkl.bz2 and .pkl

* fix: remove extra dimension in cms_pf data items

* fix cms

* fixes for delphes

* ensure dir exists

* separate cms datasets

* clarify manual dir

* cleanup print

* added singleele and singlemu

* update 1.1

* cleanup cms datasets

* update datamodel

* added new datasets

* gen/sim 12_3_0_pre6 generation (#1)

* 1.2 format, ztt dataset

* version 1.3.0 with new gensim truth

* new dataset

* add qcd

* add some asserts

* add new features

* keep PS

* add tau as pf target

* 1.3.1 remove ps and brem (#2)

* fix HF labeling (#3)

* add new high-PU QCD dataset, update energy

* up

* fix

* Add gen jet index (#4)

* first attempt at gen jet clustering

* add other reqs

* revert test

* fix mapping to before masking particles

* fix out of index bufg

* benchmark training for CMS

* move path

* move path

* remove submodule

* remove

* move

* fix import

* format

* format

* remove some dummy files

* up

* try with masking

* use a different dataset for logging the jet/met distributions

* clean

* added clic ttbar

Co-authored-by: Eric Wulff <eric.g.t.wulff@gmail.com>
Co-authored-by: Eric Wulff <eric.wulff@cern.ch>
Co-authored-by: Javier Duarte <jduarte@ucsd.edu>
erwulff added a commit that referenced this pull request Sep 22, 2023
* Initial commit

* add template dataset definitions

* Add initial CMS particle-flow dataset implementation

Also changed to a new tensorflow dataset template

* add test scripts

* Run black formatting on python files

* Add instructions to cms_pf, use manual_dir for preprocessing

* fix: ability to choose data directory for the tfrecords files

* feat: Add Delphes dataset

* fix: support loading both .pkl.bz2 and .pkl

* fix: remove extra dimension in cms_pf data items

* fix cms

* fixes for delphes

* ensure dir exists

* separate cms datasets

* clarify manual dir

* cleanup print

* added singleele and singlemu

* update 1.1

* cleanup cms datasets

* update datamodel

* added new datasets

* gen/sim 12_3_0_pre6 generation (#1)

* 1.2 format, ztt dataset

* version 1.3.0 with new gensim truth

* new dataset

* add qcd

* add some asserts

* add new features

* keep PS

* add tau as pf target

* 1.3.1 remove ps and brem (#2)

* fix HF labeling (#3)

* add new high-PU QCD dataset, update energy

* up

* fix

* Add gen jet index (#4)

* first attempt at gen jet clustering

* add other reqs

* revert test

* fix mapping to before masking particles

* fix out of index bufg

* benchmark training for CMS

* move path

* move path

* remove submodule

* remove

* move

* fix import

* format

* format

* remove some dummy files

* up

* try with masking

* use a different dataset for logging the jet/met distributions

* clean

* added clic ttbar

Co-authored-by: Eric Wulff <eric.g.t.wulff@gmail.com>
Co-authored-by: Eric Wulff <eric.wulff@cern.ch>
Co-authored-by: Javier Duarte <jduarte@ucsd.edu>
Former-commit-id: fb89d79
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants