Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge jpata/master into master #3

Merged
merged 169 commits into from
Sep 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
169 commits
Select commit Hold shift + click to select a range
bc1d185
feat: Add hyperparameter optimization
erwulff Jul 16, 2021
961d0f5
feat: Add distributed training capability in hypertune
erwulff Jul 22, 2021
d3355ec
pytorch update (training+LRP)
farakiko Jul 22, 2021
c541940
fixed confusion matrix
farakiko Jul 23, 2021
2ad8217
Merge branch 'hypertune' of github.com:erwulff/particleflow into hype…
erwulff Jul 23, 2021
c5284ef
feat: keras-tuner chief and tuner scripts
erwulff Jul 23, 2021
2806067
feat: Distributed training on Flatiron Institute HPC site
erwulff Jul 23, 2021
de05522
feat: Distributed training on JUWELS Booster
erwulff Jul 23, 2021
f7b6bae
small bug
farakiko Jul 23, 2021
8ec5c36
fix pic scales
farakiko Jul 23, 2021
3eabb82
plotting cosmetics
farakiko Jul 23, 2021
84212c9
typos
farakiko Jul 23, 2021
1a6a909
one function to plot all heatmaps
farakiko Jul 23, 2021
8e9f80d
better path location
farakiko Jul 23, 2021
59375a6
feat: Choose optimizer in config file
erwulff Jul 26, 2021
532228d
feat: Hypertune parameters in config, LR scheduling during hypertuning
erwulff Jul 26, 2021
ee44bad
feat: Add Random Search and Bayesian Optimization to hypertune
erwulff Jul 26, 2021
81c888d
chore: add hypertune settings to config files
erwulff Jul 26, 2021
18845f4
removed numba and removed hooks.py test file
farakiko Jul 27, 2021
7cef92f
better device definitions
farakiko Jul 27, 2021
2bc4506
make it clear the difference between gravnet.py and gravnet_LRP.py
farakiko Jul 27, 2021
c510ba3
fixed typo
farakiko Jul 27, 2021
8ad7ee8
rename training, removed unnecessary files
farakiko Jul 27, 2021
43aa41b
add multi-gpu flag for data_preprocessing
farakiko Jul 27, 2021
ea41fad
feat: Add early stopping to hypertune
erwulff Jul 28, 2021
5cb6002
add single class recall, detailed plots
jpata Aug 1, 2021
cb07129
added learnable kernel
jpata Aug 1, 2021
1645d62
up
jpata Aug 1, 2021
117bc6e
update cls mult
jpata Aug 1, 2021
37fc2b8
apply exp
jpata Aug 1, 2021
ab70ee4
feat: Choose if to draw events during training or not
erwulff Aug 3, 2021
bd964aa
chore: CustomTensorBoard saves history files in a subfolder
erwulff Aug 3, 2021
c10cd67
fix: get history path from tensorboard callback in train scripts
erwulff Aug 3, 2021
c17e91c
fix: convert logs dict values to float
erwulff Aug 3, 2021
ad50859
feat: Add raytune command to pipeline
erwulff Jul 27, 2021
b033d94
feat: Distributed hyperparameter search on Flatiron with Ray Tune
erwulff Aug 3, 2021
d35b2ec
chore: Use prepare_callbacks() in raytune command
erwulff Aug 3, 2021
9391a57
chore: add ray to github python installations
erwulff Aug 3, 2021
54aef72
chore: Add ray[tune] to github tests python deps
erwulff Aug 3, 2021
ba5ca38
feat: Choose between ASHA and Hyperband in raytune config
erwulff Aug 5, 2021
99c8814
feat: Add expdecay_decay_steps and layernorm to raytune params
erwulff Aug 5, 2021
dd2a60d
feat: Add batch_size to raytune hyperparameters
erwulff Aug 5, 2021
0fa17fd
updates
jpata Aug 12, 2021
2c0210d
remove additive momentum
jpata Aug 13, 2021
127d4e1
up
jpata Aug 15, 2021
308aad3
update adv training
jpata Aug 16, 2021
69efa1e
up
jpata Aug 17, 2021
71b80ff
up
jpata Aug 18, 2021
72d8ee4
fix multi gpu training
jpata Aug 18, 2021
3a1af28
remove outdated code
jpata Aug 19, 2021
6199a25
readd previous net
jpata Aug 19, 2021
2f502dc
fix
jpata Aug 19, 2021
ad0a288
readd
jpata Aug 19, 2021
9025f3f
add missing import
jpata Aug 19, 2021
4ef7be3
revert local test
jpata Aug 19, 2021
5c8ae82
fix
jpata Aug 19, 2021
3f2ead6
fix activation
jpata Aug 19, 2021
f8e65f9
revert cms-gnn-dense parameters
jpata Aug 19, 2021
a737ed1
use additive regression
jpata Aug 19, 2021
2c93578
up
jpata Aug 20, 2021
47a50d3
moomentum layers explicit
jpata Aug 20, 2021
b7e9cae
big cleanup
jpata Aug 21, 2021
5cd29d3
fix
jpata Aug 21, 2021
a2223ed
purge PFNet
jpata Aug 21, 2021
e0adb69
update
jpata Aug 21, 2021
be7e2b0
many preds
jpata Aug 21, 2021
59dbb64
cleanup scripts
jpata Aug 21, 2021
6c14d49
remove timing
jpata Aug 21, 2021
a0d2189
run evaluation with a possibly changed path
jpata Aug 21, 2021
afdf6a0
update
jpata Aug 21, 2021
15aa6e4
unify configs
jpata Aug 21, 2021
e3c10a5
update docs
jpata Aug 21, 2021
d9b5dbe
improve encoding
jpata Aug 22, 2021
1abdef5
up
jpata Aug 22, 2021
06cacb0
add residual plots
jpata Aug 22, 2021
f7d0cb4
separate energy graph layer
jpata Aug 23, 2021
d07745c
separate energy graph layer
jpata Aug 23, 2021
c21735a
just use the same cg layers
jpata Aug 23, 2021
96dd647
adding code for the optimized knn
farakiko Aug 23, 2021
38e6f8d
fixing optimized model definition
farakiko Aug 23, 2021
0ac649c
testing average inference time more precisely
farakiko Aug 23, 2021
44c1043
test optimized knn inference time
farakiko Aug 23, 2021
ed54ed4
Merge branch 'master' of https://github.com/jpata/particleflow
farakiko Aug 23, 2021
44a9c64
organized the pipeline
farakiko Aug 23, 2021
56b970e
uncomment args
farakiko Aug 23, 2021
7b64bea
fix github check/build
farakiko Aug 23, 2021
f13718f
feat: log nvidia-smi info to csv file
erwulff Aug 24, 2021
ce899c3
feat: plot GPU util from nvidia-smi csv-file
erwulff Aug 24, 2021
4adbf3a
readd comet
jpata Aug 24, 2021
bea6599
updat epochs
jpata Aug 24, 2021
160db53
comet optional
jpata Aug 24, 2021
6671735
fix model saving
jpata Aug 24, 2021
bedebfa
fix: bug in plotting of GPU util
erwulff Aug 24, 2021
e2a9878
update delphes
jpata Aug 24, 2021
14a0003
up
jpata Aug 24, 2021
2598800
work on improving energy regression
jpata Aug 25, 2021
17b547f
added customization function
jpata Aug 25, 2021
071c1ff
better name
jpata Aug 25, 2021
a77e6ef
fix import
farakiko Aug 25, 2021
7d15a14
commenting the optimized code
farakiko Aug 25, 2021
ab80242
bug in the path
farakiko Aug 25, 2021
315ddd3
fixed module import in graph_data_delphes
farakiko Aug 25, 2021
a84a36d
small edit
farakiko Aug 25, 2021
b1a693b
big cleanup
farakiko Aug 25, 2021
f199701
lower case lrp + fix local_test.sh
farakiko Aug 25, 2021
7bbe245
Rename LRP_clf_gpu.py to lrp_clf_gpu.py
farakiko Aug 25, 2021
73b3e54
Rename LRP_reg_gpu.py to lrp_reg_gpu.py
farakiko Aug 25, 2021
4f5dea7
Rename LRP_pipeline.py to lrp_pipeline.py
farakiko Aug 25, 2021
3f78804
attempt to rename LRP to lrp through a tmp
farakiko Aug 25, 2021
5fd1719
renaming directory from LRP to lrp
farakiko Aug 25, 2021
ea25798
Rename LRP_pipeline.py to lrp_pipeline.py
farakiko Aug 25, 2021
fb29c20
adding lrp to the quick testing bash script
farakiko Aug 25, 2021
768db2f
fix name of model
farakiko Aug 25, 2021
d26f264
oops
farakiko Aug 25, 2021
2d83dfa
feat: Add logging of GPU power
erwulff Aug 26, 2021
c443137
fix energy regression
jpata Aug 26, 2021
e809965
feat: Add plotting of GPU memory usage from nvidia-smi log
erwulff Aug 26, 2021
3b78128
layernorm
jpata Aug 26, 2021
fe6b328
added gen training
jpata Aug 27, 2021
98e4d37
Merge branch 'jpata_dev_aug21' of https://github.com/jpata/particlefl…
jpata Aug 27, 2021
d301c5b
more configurable options, use main pars for pipeline test
jpata Aug 27, 2021
204c651
fix paths for pipeline
jpata Aug 27, 2021
e9d5bc5
fix paths once again
jpata Aug 27, 2021
722b41b
fix
jpata Aug 27, 2021
21270d4
fix
jpata Aug 27, 2021
5be47dd
up
jpata Aug 27, 2021
2b9f218
feat: Produce Ray analysis plots in raytune command
erwulff Aug 27, 2021
0956d54
fix: Add check for empty CUDA_VISIBLE_DEVICES
erwulff Aug 27, 2021
9f22cb6
fix num events
jpata Aug 27, 2021
b15b6d7
change epochs
jpata Aug 27, 2021
88cf858
add microsecond for simultaneous start
jpata Aug 27, 2021
4261cf4
added additional energy graph layer
jpata Aug 27, 2021
1ceb53c
Merge branch 'jpata_dev_aug21' of github.com:jpata/particleflow into …
jpata Aug 27, 2021
7b7cce1
custom train step for regression
jpata Aug 28, 2021
115419a
up
jpata Aug 29, 2021
53cec23
option to split energy regression classwise
jpata Aug 29, 2021
47cd7a2
fix
jpata Aug 29, 2021
d03a64d
fix
jpata Aug 29, 2021
6e25bb2
remove log
jpata Aug 30, 2021
8da1044
final update
jpata Aug 31, 2021
59dcf03
Merge pull request #76 from jpata/jpata_dev_aug21
jpata Aug 31, 2021
dff2f1d
update cms-dev model
jpata Aug 31, 2021
7cb3d49
update cms-dev
jpata Aug 31, 2021
13644f5
fix: raytune-analysis plots include val_loss
erwulff Aug 31, 2021
515c149
remove lrp part from local script
farakiko Aug 31, 2021
9ce68d0
add dist activation as configurable
jpata Aug 31, 2021
a2356f3
more monitoring
jpata Sep 1, 2021
6478716
Merge branch 'raytune' into upstream/master
erwulff Sep 1, 2021
e2256ad
Merge pull request #73 from faroukmokhtar/master
jpata Sep 1, 2021
cb79eea
fix: add missing settings to delphes.yaml
erwulff Sep 1, 2021
7c49878
feat: add option to resume a raytune run
erwulff Sep 1, 2021
bc0c940
lsh configurable
jpata Sep 1, 2021
4a60f46
added LSH scanning
jpata Sep 2, 2021
63446fd
up
jpata Sep 2, 2021
590a100
epoch one-based
jpata Sep 2, 2021
32a32ec
optimization with cls only
jpata Sep 2, 2021
a668114
Merge branch 'jpata_sept21' of https://github.com/jpata/particleflow …
jpata Sep 2, 2021
9a921eb
mpnn optimization
jpata Sep 2, 2021
557ccf6
up
jpata Sep 2, 2021
df8fc6f
up
jpata Sep 2, 2021
7b0f17d
feat: limit raytune pending trials SLURM_NNODES
erwulff Sep 3, 2021
30237f5
Merge pull request #77 from erwulff/raytune
jpata Sep 3, 2021
a7985ea
add charge to clic pf candidate
jpata Sep 3, 2021
f2d75e0
added cms gen config
jpata Sep 3, 2021
80febc5
Merge remote-tracking branch 'origin/master' into jpata_sept21
jpata Sep 3, 2021
1519195
fix bin size
jpata Sep 3, 2021
47ae9a4
fixes for gun sample training
jpata Sep 3, 2021
cf7d974
make sure batch size is propagated
jpata Sep 3, 2021
504cab7
Merge pull request #78 from jpata/jpata_sept21
jpata Sep 3, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 2 additions & 28 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,32 +9,6 @@ on:
workflow_dispatch:

jobs:
delphes-tf:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install python deps
run: |
sudo apt install python3 python3-pip wget
sudo python3 -m pip install --upgrade pip
sudo python3 -m pip install --upgrade setuptools
sudo python3 -m pip install tensorflow==2.4 setGPU sklearn matplotlib mplhep pandas scipy uproot3 uproot3-methods awkward0 keras-tuner networkx tensorflow-probability==0.12.2 tensorflow-addons==0.13.0 tqdm
- name: Run delphes TF model
run: ./scripts/local_test_delphes_tf.sh

cms-tf:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install python deps
run: |
sudo apt install python3 python3-pip wget
sudo python3 -m pip install --upgrade pip
sudo python3 -m pip install --upgrade setuptools
sudo python3 -m pip install tensorflow==2.4 setGPU sklearn matplotlib mplhep pandas scipy uproot3 uproot3-methods awkward0 keras-tuner networkx tensorflow-probability==0.12.2 tensorflow-addons==0.13.0 tqdm
- name: Run CMS TF model
run: ./scripts/local_test_cms_tf.sh

delphes-pipeline:
runs-on: ubuntu-latest
steps:
Expand All @@ -44,7 +18,7 @@ jobs:
sudo apt install python3 python3-pip wget
sudo python3 -m pip install --upgrade pip
sudo python3 -m pip install --upgrade setuptools
sudo python3 -m pip install tensorflow==2.4 setGPU sklearn matplotlib mplhep pandas scipy uproot3 uproot3-methods awkward0 keras-tuner networkx tensorflow-probability==0.12.2 tensorflow-addons==0.13.0 tqdm click
sudo python3 -m pip install tensorflow==2.4 setGPU sklearn matplotlib mplhep pandas scipy uproot3 uproot3-methods awkward0 keras-tuner networkx tensorflow-probability==0.12.2 tensorflow-addons==0.13.0 tqdm click 'ray[default]' 'ray[tune]'
- name: Run delphes TF model
run: ./scripts/local_test_delphes_pipeline.sh

Expand All @@ -57,7 +31,7 @@ jobs:
sudo apt install python3 python3-pip wget
sudo python3 -m pip install --upgrade pip
sudo python3 -m pip install --upgrade setuptools
sudo python3 -m pip install tensorflow==2.4 setGPU sklearn matplotlib mplhep pandas scipy uproot3 uproot3-methods awkward0 keras-tuner networkx tensorflow-probability==0.12.2 tensorflow-addons==0.13.0 tqdm click
sudo python3 -m pip install tensorflow==2.4 setGPU sklearn matplotlib mplhep pandas scipy uproot3 uproot3-methods awkward0 keras-tuner networkx tensorflow-probability==0.12.2 tensorflow-addons==0.13.0 tqdm click 'ray[default]' 'ray[tune]'
- name: Run CMS TF model using the pipeline
run: ./scripts/local_test_cms_pipeline.sh

Expand Down
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,12 @@ mlpf/pytorch/data
test_tmp/
test_tmp_delphes/
.DS_Store

prp
*.pyc
*.pyo

mlpf/updated/LRP/pid*
mlpf/updated/LRP/class*

*.ipynb_checkpoints
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
Short instructions with a single test file in [notebooks/delphes-tf-mlpf-quickstart.ipynb](notebooks/delphes-tf-mlpf-quickstart.ipynb).

Long instructions for reproducing the full training from scratch in [README_delphes.md](README_delphes.md).
The plots can be generated using the notebook [delphes/resolution_checks.ipynb](delphes/resolution_checks.ipynb).
The plots can be generated using the notebook [delphes/delphes_model_analysis.ipynb](delphes/delphes_model_analysis.ipynb).

### Delphes dataset
The dataset is available from zenodo: https://doi.org/10.5281/zenodo.4559324.
Expand Down
2 changes: 1 addition & 1 deletion README_cms.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,5 @@ git clone https://github.com/jpata/particleflow.git
cd particleflow

#run a small local test including data prep and training
./scripts/local_test_cms_tf.sh
./scripts/local_test_cms_pipeline.sh
```
63 changes: 34 additions & 29 deletions README_delphes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,28 +2,55 @@

The following instructions use singularity, but you may have a different local setup.

```bash
#Download all pkl.bz2 files from https://zenodo.org/record/4559324

#now move the data into the right place
mv *pythia8_qcd*.pkl.bz2 data/pythia8_qcd/val
mv *pythia8_ttbar*.pkl.bz2 data/pythia8_qcd/raw
mv data/pythia8_qcd/raw/*pythia8_ttbar_9_*.pkl.bz2 data/pythia8_qcd/val

# Generate the TFRecord datasets needed for larger-than-RAM training
python3 mlpf/pipeline.py data -c parameters/delphes.yaml

# Run the training of the base GNN model using e.g. 5 GPUs in a data-parallel mode
CUDA_VISIBLE_DEVICES=0,1,2,3,4 python3 mlpf/pipeline.py train -c parameters/delphes.yaml

#Run the validation to produce the predictions file
python3 mlpf/pipeline.py evaluate -c parameters/delphes.yaml -t experiments/delphes_* -v "data/pythia8_qcd/val/*.pkl.bz2" -e evaluate_qcd
python3 mlpf/pipeline.py evaluate -c parameters/delphes.yaml -t experiments/delphes_* -v "data/pythia8_ttbar/val/*.pkl.bz2" -e evaluate_ttbar
```

## Recipe for generation
The Delphes AngularSmearing module has been modified to correctly take into account the smearing for tracks, see [delphes/install.sh](delphes/install.sh).

```bash
wget http://atlaswww.hep.anl.gov/hepsim/soft/centos7hepsim.img
sudo singularity build --sandbox centos7hepsim.sandbox centos7hepsim.img
sudo singularity exec -B /home --writable centos7hepsim.sandbox ./install.sh
sudo singularity build centos7hepsim.sif centos7hepsim.sandbox
sudo rm -Rf centos7hepsim.sandbox
```

```bash
cd delphes

# Run the simulation step
# Generate events with pythia, mix them with PU and run a detector simulation using Delphes
singularity exec http://jpata.web.cern.ch/jpata/centos7hepsim.sif ./run_sim.sh
singularity exec centos7hepsim.sif ./run_sim.sh

# Run the ntuplization step
# generate X,y input matrices for NN training in out/pythia8_ttbar/*.pkl.bz2
singularity exec http://jpata.web.cern.ch/jpata/centos7hepsim.sif ./run_ntuple.sh
singularity exec http://jpata.web.cern.ch/jpata/centos7hepsim.sif ./run_ntuple_qcd.sh
singularity exec centos7hepsim.sif ./run_ntuple.sh
singularity exec centos7hepsim.sif ./run_ntuple_qcd.sh

#Alternatively, to skip run_sim.sh and run_ntuple.sh, download everything from https://doi.org/10.5281/zenodo.4452283 and put into out/pythia8_ttbar

#now move the data into the right place
mv out/pythia8_ttbar ../data/
cd ../data/pythia8_ttbar
mkdir raw
mkdir val
mkdir root
mv *.root root/
mb *.promc root/
mv *.promc root/
mv *.pkl.bz2 raw/
cd ../..

Expand All @@ -35,26 +62,4 @@ mv *.root root/
mv *.promc root/
mv *.pkl.bz2 val/
cd ../..

# Generate the TFRecord datasets needed for larger-than-RAM training
singularity exec --nv http://jpata.web.cern.ch/jpata/base.simg python3 mlpf/launcher.py --action data --model-spec parameters/delphes-gnn-skipconn.yaml

# Run the training of the base GNN model using e.g. 5 GPUs in a data-parallel mode
CUDA_VISIBLE_DEVICES=0,1,2,3,4 singularity exec --nv http://jpata.web.cern.ch/jpata/base.simg python3 mlpf/launcher.py --action train --model-spec parameters/delphes-gnn-skipconn.yaml

#Run the validation to produce the predictions file
singularity exec --nv http://jpata.web.cern.ch/jpata/base.simg python3 mlpf/launcher.py --action eval --model-spec parameters/delphes-gnn-skipconn.yaml --weights ./experiments/delphes-gnn-skipconn-*/weights-300-*.hdf5

singularity exec --nv http://jpata.web.cern.ch/jpata/base.simg python3 mlpf/launcher.py --action time --model-spec parameters/delphes-gnn-skipconn.yaml --weights ./experiments/delphes-gnn-skipconn-*/weights-300-*.hdf5
```

## Recipe to prepare Delphes singularity image
NB: The Delphes AngularSmearing module has been modified to correctly take into account the smearing for tracks, see [delphes/install.sh](delphes/install.sh)

```bash
wget http://atlaswww.hep.anl.gov/hepsim/soft/centos7hepsim.img
sudo singularity build --sandbox centos7hepsim.sandbox centos7hepsim.img
sudo singularity exec -B /home --writable centos7hepsim.sandbox ./install.sh
sudo singularity build centos7hepsim.sif centos7hepsim.sandbox
sudo rm -Rf centos7hepsim.sandbox
```
6 changes: 4 additions & 2 deletions clic/dumper.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ def pfParticleToDict(par):
"px": mom[0],
"py": mom[1],
"pz": mom[2],
"energy": par.getEnergy()
"energy": par.getEnergy(),
"charge": par.getCharge()
}
return vec

Expand Down Expand Up @@ -210,6 +211,7 @@ def caloHitToDict(par, calohit_to_cluster, genparticle_dict, calohit_recotosim):
nPF=colPF.getNumberOfElements()
nCl=colCl.getNumberOfElements()
nTr=colTr.getNumberOfElements()
nHit=simTrackHits.getNumberOfElements()
nHCB=colHCB.getNumberOfElements()
nHCE=colHCE.getNumberOfElements()
nECB=colECB.getNumberOfElements()
Expand All @@ -223,7 +225,7 @@ def caloHitToDict(par, calohit_to_cluster, genparticle_dict, calohit_recotosim):
assert(not (recohit in calohit_recotosim))
calohit_recotosim[recohit] = simhit

print "Event %d, nGen=%d, nPF=%d, nClusters=%d, nTracks=%d, nHCAL=%d, nECAL=%d" % (nEvent, nMc, nPF, nCl, nTr, nHCB+nHCE, nECB+nECE)
print "Event %d, nGen=%d, nPF=%d, nClusters=%d, nTracks=%d, nHCAL=%d, nECAL=%d, nHits=%d" % (nEvent, nMc, nPF, nCl, nTr, nHCB+nHCE, nECB+nECE, nHit)

genparticles = []
genparticle_dict = {}
Expand Down
Loading