fixes for pytorch, CMS t1tttt dataset, update response plots #232

jpata · 2023-10-11T07:02:22Z

follow-up fixes for pytorch training
- multi-GPU training currently does not go to second epoch yet
generate new t1tttt SUSY dataset for CMS
update response plots
- plot IQR over median to be less dependent on absolute jet energy scale prediction (more fair)

On a single A100 with the CMS dataset, I'm getting the following ETA for one epoch now:

$ apptainer exec --nv -B /scratch/persistent/joosep/tensorflow_datasets/ ~/singularity/pytorch.simg python3 mlpf/pyg_pipeline.py --dataset cms --gpus $CUDA_VISIBLE_DEVICES --data_dir /scratch/persistent/joosep/tensorflow_datasets/ --train --conv-type gnn-lsh --overwrite --num-epochs 10 --gpu-batch-multiplier 15 --model-prefix experiments/MLPF_gnnlsh

INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (496) bind mounts
INFO:mlpf:Will use single-gpu: NVIDIA A100 80GB PCIe
INFO:mlpf:MLPF(
  (nn0): Sequential(
    (0): Linear(in_features=42, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    (3): Dropout(p=0.0, inplace=False)
    (4): Linear(in_features=512, out_features=512, bias=True)
  )
  (conv_id): ModuleList(
    (0-2): 3 x CombinedGraphLayer(
      (layernorm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
      (ffn_dist): Sequential(
        (0): Linear(in_features=512, out_features=128, bias=True)
        (1): ELU(alpha=1.0)
        (2): Linear(in_features=128, out_features=128, bias=True)
        (3): ELU(alpha=1.0)
        (4): Linear(in_features=128, out_features=128, bias=True)
      )
      (message_building_layer): MessageBuildingLayerLSH(
        (kernel): NodePairGaussianKernel()
      )
      (message_passing_layers): ModuleList(
        (0-1): 2 x GHConvDense()
      )
    )
  )
  (conv_reg): ModuleList(
    (0-2): 3 x CombinedGraphLayer(
      (layernorm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
      (ffn_dist): Sequential(
        (0): Linear(in_features=512, out_features=128, bias=True)
        (1): ELU(alpha=1.0)
        (2): Linear(in_features=128, out_features=128, bias=True)
        (3): ELU(alpha=1.0)
        (4): Linear(in_features=128, out_features=128, bias=True)
      )
      (message_building_layer): MessageBuildingLayerLSH(
        (kernel): NodePairGaussianKernel()
      )
      (message_passing_layers): ModuleList(
        (0-1): 2 x GHConvDense()
      )
    )
  )
  (nn_id): Sequential(
    (0): Linear(in_features=1578, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    (3): Dropout(p=0.0, inplace=False)
    (4): Linear(in_features=512, out_features=9, bias=True)
  )
  (nn_pt): Sequential(
    (0): Linear(in_features=1587, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    (3): Dropout(p=0.0, inplace=False)
    (4): Linear(in_features=512, out_features=1, bias=True)
  )
  (nn_eta): Sequential(
    (0): Linear(in_features=1587, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    (3): Dropout(p=0.0, inplace=False)
    (4): Linear(in_features=512, out_features=1, bias=True)
  )
  (nn_phi): Sequential(
    (0): Linear(in_features=1587, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    (3): Dropout(p=0.0, inplace=False)
    (4): Linear(in_features=512, out_features=2, bias=True)
  )
  (nn_energy): Sequential(
    (0): Linear(in_features=1587, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    (3): Dropout(p=0.0, inplace=False)
    (4): Linear(in_features=512, out_features=1, bias=True)
  )
  (nn_charge): Sequential(
    (0): Linear(in_features=1587, out_features=512, bias=True)
    (1): ELU(alpha=1.0)
    (2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    (3): Dropout(p=0.0, inplace=False)
    (4): Linear(in_features=512, out_features=3, bias=True)
  )
)
INFO:mlpf:Model directory experiments/MLPF_gnnlsh
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_ttbar/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_ttbar, split='train', decoders=None), 80000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_ttbar/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_ttbar, split='test', decoders=None), 20000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_qcd/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_qcd, split='train', decoders=None), 80000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_qcd/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_qcd, split='test', decoders=None), 20000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_ztt/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_ztt, split='train', decoders=None), 80000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_ztt/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_ztt, split='test', decoders=None), 20000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_qcd_high_pt/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_qcd_high_pt, split='train', decoders=None), 80000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_qcd_high_pt/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_qcd_high_pt, split='test', decoders=None), 20000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_sms_t1tttt/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_sms_t1tttt, split='train', decoders=None), 163600
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_sms_t1tttt/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_sms_t1tttt, split='test', decoders=None), 41000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_electron/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_single_electron, split='train', decoders=None), 800000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_electron/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_single_electron, split='test', decoders=None), 200000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_gamma/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_single_gamma, split='train', decoders=None), 800000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_gamma/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_single_gamma, split='test', decoders=None), 200000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_pi0/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_single_pi0, split='train', decoders=None), 800000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_pi0/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_single_pi0, split='test', decoders=None), 200000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_neutron/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_single_neutron, split='train', decoders=None), 800000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_neutron/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_single_neutron, split='test', decoders=None), 200000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_pi/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_single_pi, split='train', decoders=None), 800000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_pi/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_single_pi, split='test', decoders=None), 200000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_tau/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_single_tau, split='train', decoders=None), 800000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_tau/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_single_tau, split='test', decoders=None), 200000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_mu/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_single_mu, split='train', decoders=None), 800000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_mu/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_single_mu, split='test', decoders=None), 200000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_proton/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_single_proton, split='train', decoders=None), 800000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_proton/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_single_proton, split='test', decoders=None), 200000
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_multi_particle_gun/1.6.0
INFO:mlpf:train_dataset: DataSource(name=cms_pf_multi_particle_gun, split='train', decoders=None), 162600
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_multi_particle_gun/1.6.0
INFO:mlpf:valid_dataset: DataSource(name=cms_pf_multi_particle_gun, split='test', decoders=None), 40700
INFO:mlpf:Initiating a training run on device 0
  4%|▎         | 2008/55747 [1:51:57<36:23:42,  2.44s/it]

For comparison, the TF GNN LSH model:

INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (547) bind mounts
2023-10-11 14:06:15.911877: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-11 14:06:16.789247: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:numexpr.utils:Note: NumExpr detected 64 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:root:loaded config file: parameters/cms-gen.yaml
INFO:root:Dynamic batching is enabled, changing batch size multiplier from 1 to 2.0
INFO:root:Using a single GPU with tf.distribute.OneDeviceStrategy()
2023-10-11 14:06:23.722474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 78227 MB memory:  -> device: 0, name: NVIDIA A100 80GB PCIe, pci bus id: 0000:98:00.0, compute capability: 8.0
INFO:root:Creating experiment dir experiments/cms-gen_20231011_140623_726842.gpu1.local
INFO:root:Using comet-ml Experiment, streaming logs to www.comet.ml.
COMET INFO: Experiment is live on comet.com https://www.comet.com/jpata/particleflow-tf/8b5e1ddf922f4ee5a5bc9f990644fad4

INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_multi_particle_gun/1.6.0
INFO:root:Loaded cms_pf_multi_particle_gun:train with 162600 samples
2023-10-11 14:06:29.161646: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization
INFO:root:Interleaved joint dataset multiparticlegun:train with 162600 steps, 162600 samples
INFO:root:Batching multiparticlegun:train with bucket_by_sequence_length
INFO:root:bucket_boundaries=[641, 1281, 1921, 2561, 3201, 3841, 4481, 5121, 5761, 6401, 7041, 7681, 8321, 8961, 9601, 10241, 10881, 11521, 12161, 12801, 13441, 14081, 14721, 15361, 16001, 16641, 17281, 17921, 18561, 19201, 19841, 20481, 21121, 21761, 22401, 23041, 23681, 24321, 24961, 25601, 26241, 26881, 27521, 28161, 28801, 29441, 30081, 30721, 31361, 32001, 32641, 33281, 33921, 34561, 35201, 35841, 36481, 37121, 37761]
INFO:root:bucket_batch_sizes=[120, 60, 40, 30, 24, 20, 16, 14, 12, 12, 10, 10, 8, 8, 8, 6, 6, 6, 6, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
INFO:root:loading state from caches/cms_gen/multiparticlegun_train.json
INFO:root:Dataset multiparticlegun after batching, 1400 steps, 162600 samples
INFO:root:saving state to caches/cms_gen/multiparticlegun_train.json
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_ttbar/1.6.0
INFO:root:Loaded cms_pf_ttbar:train with 80000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_ztt/1.6.0
INFO:root:Loaded cms_pf_ztt:train with 80000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_qcd/1.6.0
INFO:root:Loaded cms_pf_qcd:train with 80000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_qcd_high_pt/1.6.0
INFO:root:Loaded cms_pf_qcd_high_pt:train with 80000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_sms_t1tttt/1.6.0
INFO:root:Loaded cms_pf_sms_t1tttt:train with 163600 samples
2023-10-11 14:06:30.024947: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization
INFO:root:Interleaved joint dataset physical:train with 483600 steps, 483600 samples
INFO:root:Batching physical:train with bucket_by_sequence_length
INFO:root:bucket_boundaries=[641, 1281, 1921, 2561, 3201, 3841, 4481, 5121, 5761, 6401, 7041, 7681, 8321, 8961, 9601, 10241, 10881, 11521, 12161, 12801, 13441, 14081, 14721, 15361, 16001, 16641, 17281, 17921, 18561, 19201, 19841, 20481, 21121, 21761, 22401, 23041, 23681, 24321, 24961, 25601, 26241, 26881, 27521, 28161, 28801, 29441, 30081, 30721, 31361, 32001, 32641, 33281, 33921, 34561, 35201, 35841, 36481, 37121, 37761]
INFO:root:bucket_batch_sizes=[120, 60, 40, 30, 24, 20, 16, 14, 12, 12, 10, 10, 8, 8, 8, 6, 6, 6, 6, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
INFO:root:loading state from caches/cms_gen/physical_train.json
INFO:root:Dataset physical after batching, 19729 steps, 483600 samples
INFO:root:saving state to caches/cms_gen/physical_train.json
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_electron/1.6.0
INFO:root:Loaded cms_pf_single_electron:train with 800000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_gamma/1.6.0
INFO:root:Loaded cms_pf_single_gamma:train with 800000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_neutron/1.6.0
INFO:root:Loaded cms_pf_single_neutron:train with 800000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_pi0/1.6.0
INFO:root:Loaded cms_pf_single_pi0:train with 800000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_pi/1.6.0
INFO:root:Loaded cms_pf_single_pi:train with 800000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_tau/1.6.0
INFO:root:Loaded cms_pf_single_tau:train with 800000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_mu/1.6.0
INFO:root:Loaded cms_pf_single_mu:train with 800000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_proton/1.6.0
INFO:root:Loaded cms_pf_single_proton:train with 800000 samples
2023-10-11 14:06:31.015973: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization
INFO:root:Interleaved joint dataset gun:train with 6400000 steps, 6400000 samples
INFO:root:Batching gun:train with bucket_by_sequence_length
INFO:root:bucket_boundaries=[641, 1281, 1921, 2561, 3201, 3841, 4481, 5121, 5761, 6401, 7041, 7681, 8321, 8961, 9601, 10241, 10881, 11521, 12161, 12801, 13441, 14081, 14721, 15361, 16001, 16641, 17281, 17921, 18561, 19201, 19841, 20481, 21121, 21761, 22401, 23041, 23681, 24321, 24961, 25601, 26241, 26881, 27521, 28161, 28801, 29441, 30081, 30721, 31361, 32001, 32641, 33281, 33921, 34561, 35201, 35841, 36481, 37121, 37761]
INFO:root:bucket_batch_sizes=[120, 60, 40, 30, 24, 20, 16, 14, 12, 12, 10, 10, 8, 8, 8, 6, 6, 6, 6, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
INFO:root:loading state from caches/cms_gen/gun_train.json
INFO:root:Dataset gun after batching, 53333 steps, 6400000 samples
INFO:root:saving state to caches/cms_gen/gun_train.json
2023-10-11 14:06:31.209387: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization
INFO:root:Interleaved joint dataset all:train with 74462 steps, 7046200 samples
INFO:root:Final dataset with 74462 steps
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_multi_particle_gun/1.6.0
INFO:root:Loaded cms_pf_multi_particle_gun:test with 40700 samples
2023-10-11 14:06:31.239190: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization
INFO:root:Interleaved joint dataset multiparticlegun:test with 40700 steps, 40700 samples
INFO:root:Batching multiparticlegun:test with bucket_by_sequence_length
INFO:root:bucket_boundaries=[641, 1281, 1921, 2561, 3201, 3841, 4481, 5121, 5761, 6401, 7041, 7681, 8321, 8961, 9601, 10241, 10881, 11521, 12161, 12801, 13441, 14081, 14721, 15361, 16001, 16641, 17281, 17921, 18561, 19201, 19841, 20481, 21121, 21761, 22401, 23041, 23681, 24321, 24961, 25601, 26241, 26881, 27521, 28161, 28801, 29441, 30081, 30721, 31361, 32001, 32641, 33281, 33921, 34561, 35201, 35841, 36481, 37121, 37761]
INFO:root:bucket_batch_sizes=[120, 60, 40, 30, 24, 20, 16, 14, 12, 12, 10, 10, 8, 8, 8, 6, 6, 6, 6, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
INFO:root:loading state from caches/cms_gen/multiparticlegun_test.json
INFO:root:Dataset multiparticlegun after batching, 350 steps, 40700 samples
INFO:root:saving state to caches/cms_gen/multiparticlegun_test.json
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_ttbar/1.6.0
INFO:root:Loaded cms_pf_ttbar:test with 20000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_ztt/1.6.0
INFO:root:Loaded cms_pf_ztt:test with 20000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_qcd/1.6.0
INFO:root:Loaded cms_pf_qcd:test with 20000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_qcd_high_pt/1.6.0
INFO:root:Loaded cms_pf_qcd_high_pt:test with 20000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_sms_t1tttt/1.6.0
INFO:root:Loaded cms_pf_sms_t1tttt:test with 41000 samples
2023-10-11 14:06:31.597634: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization
INFO:root:Interleaved joint dataset physical:test with 121000 steps, 121000 samples
INFO:root:Batching physical:test with bucket_by_sequence_length
INFO:root:bucket_boundaries=[641, 1281, 1921, 2561, 3201, 3841, 4481, 5121, 5761, 6401, 7041, 7681, 8321, 8961, 9601, 10241, 10881, 11521, 12161, 12801, 13441, 14081, 14721, 15361, 16001, 16641, 17281, 17921, 18561, 19201, 19841, 20481, 21121, 21761, 22401, 23041, 23681, 24321, 24961, 25601, 26241, 26881, 27521, 28161, 28801, 29441, 30081, 30721, 31361, 32001, 32641, 33281, 33921, 34561, 35201, 35841, 36481, 37121, 37761]
INFO:root:bucket_batch_sizes=[120, 60, 40, 30, 24, 20, 16, 14, 12, 12, 10, 10, 8, 8, 8, 6, 6, 6, 6, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
INFO:root:loading state from caches/cms_gen/physical_test.json
INFO:root:Dataset physical after batching, 4936 steps, 121000 samples
INFO:root:saving state to caches/cms_gen/physical_test.json
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_electron/1.6.0
INFO:root:Loaded cms_pf_single_electron:test with 200000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_gamma/1.6.0
INFO:root:Loaded cms_pf_single_gamma:test with 200000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_neutron/1.6.0
INFO:root:Loaded cms_pf_single_neutron:test with 200000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_pi0/1.6.0
INFO:root:Loaded cms_pf_single_pi0:test with 200000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_pi/1.6.0
INFO:root:Loaded cms_pf_single_pi:test with 200000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_tau/1.6.0
INFO:root:Loaded cms_pf_single_tau:test with 200000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_mu/1.6.0
INFO:root:Loaded cms_pf_single_mu:test with 200000 samples
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_single_proton/1.6.0
INFO:root:Loaded cms_pf_single_proton:test with 200000 samples
2023-10-11 14:06:32.036213: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization
INFO:root:Interleaved joint dataset gun:test with 1600000 steps, 1600000 samples
INFO:root:Batching gun:test with bucket_by_sequence_length
INFO:root:bucket_boundaries=[641, 1281, 1921, 2561, 3201, 3841, 4481, 5121, 5761, 6401, 7041, 7681, 8321, 8961, 9601, 10241, 10881, 11521, 12161, 12801, 13441, 14081, 14721, 15361, 16001, 16641, 17281, 17921, 18561, 19201, 19841, 20481, 21121, 21761, 22401, 23041, 23681, 24321, 24961, 25601, 26241, 26881, 27521, 28161, 28801, 29441, 30081, 30721, 31361, 32001, 32641, 33281, 33921, 34561, 35201, 35841, 36481, 37121, 37761]
INFO:root:bucket_batch_sizes=[120, 60, 40, 30, 24, 20, 16, 14, 12, 12, 10, 10, 8, 8, 8, 6, 6, 6, 6, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
INFO:root:loading state from caches/cms_gen/gun_test.json
INFO:root:Dataset gun after batching, 13333 steps, 1600000 samples
INFO:root:saving state to caches/cms_gen/gun_test.json
2023-10-11 14:06:32.219489: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization
INFO:root:Interleaved joint dataset all:test with 18619 steps, 1761700 samples
INFO:root:Final dataset with 18619 steps
INFO:absl:Load dataset info from /scratch/persistent/joosep/tensorflow_datasets/cms_pf_qcd_high_pt/1.6.0
INFO:root:Loaded cms_pf_qcd_high_pt:test with 500 samples
INFO:root:num_train_steps: 74462
INFO:root:num_test_steps: 18619
INFO:root:epochs: 50, total_train_steps: 3723100
INFO:root:not using LR schedule
INFO:root:setting model weights
setting trainable layers: None
trainable=12594961 non_trainable=76800
INFO:root:model weights follow
INFO:root:layer=node_encoding_dense_0/kernel:0 trainable=True shape=(67, 512) num_weights=34304
INFO:root:layer=node_encoding_dense_0/bias:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=node_encoding_dense_1/kernel:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=node_encoding_dense_1/bias:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_0/cg_id_0_layernorm1/gamma:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_0/cg_id_0_layernorm1/beta:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_0_ffn_dist_dense_0/kernel:0 trainable=True shape=(512, 128) num_weights=65536
INFO:root:layer=cg_id_0_ffn_dist_dense_0/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_id_0_ffn_dist_dense_1/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_id_0_ffn_dist_dense_1/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_id_0_ffn_dist_dense_2/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_id_0_ffn_dist_dense_2/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_id_0/cg_id_0_msg_0/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_0/cg_id_0_msg_0/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_0/cg_id_0_msg_0/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_0/cg_id_0_msg_0/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_0/cg_id_0_msg_1/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_0/cg_id_0_msg_1/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_0/cg_id_0_msg_1/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_0/cg_id_0_msg_1/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_1/cg_id_1_layernorm1/gamma:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_1/cg_id_1_layernorm1/beta:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_1_ffn_dist_dense_0/kernel:0 trainable=True shape=(512, 128) num_weights=65536
INFO:root:layer=cg_id_1_ffn_dist_dense_0/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_id_1_ffn_dist_dense_1/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_id_1_ffn_dist_dense_1/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_id_1_ffn_dist_dense_2/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_id_1_ffn_dist_dense_2/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_id_1/cg_id_1_msg_0/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_1/cg_id_1_msg_0/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_1/cg_id_1_msg_0/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_1/cg_id_1_msg_0/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_1/cg_id_1_msg_1/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_1/cg_id_1_msg_1/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_1/cg_id_1_msg_1/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_1/cg_id_1_msg_1/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_2/cg_id_2_layernorm1/gamma:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_2/cg_id_2_layernorm1/beta:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_2_ffn_dist_dense_0/kernel:0 trainable=True shape=(512, 128) num_weights=65536
INFO:root:layer=cg_id_2_ffn_dist_dense_0/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_id_2_ffn_dist_dense_1/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_id_2_ffn_dist_dense_1/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_id_2_ffn_dist_dense_2/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_id_2_ffn_dist_dense_2/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_id_2/cg_id_2_msg_0/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_2/cg_id_2_msg_0/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_2/cg_id_2_msg_0/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_2/cg_id_2_msg_0/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_2/cg_id_2_msg_1/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_2/cg_id_2_msg_1/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_id_2/cg_id_2_msg_1/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_2/cg_id_2_msg_1/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_id_0/message_building_layer_lsh/lsh_projections:0 trainable=False shape=(128, 100) num_weights=12800
INFO:root:layer=cg_id_1/message_building_layer_lsh_1/lsh_projections:0 trainable=False shape=(128, 100) num_weights=12800
INFO:root:layer=cg_id_2/message_building_layer_lsh_2/lsh_projections:0 trainable=False shape=(128, 100) num_weights=12800
INFO:root:layer=cg_reg_0/cg_reg_0_layernorm1/gamma:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_0/cg_reg_0_layernorm1/beta:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_0_ffn_dist_dense_0/kernel:0 trainable=True shape=(512, 128) num_weights=65536
INFO:root:layer=cg_reg_0_ffn_dist_dense_0/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_reg_0_ffn_dist_dense_1/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_reg_0_ffn_dist_dense_1/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_reg_0_ffn_dist_dense_2/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_reg_0_ffn_dist_dense_2/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_reg_0/cg_reg_0_msg_0/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_0/cg_reg_0_msg_0/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_0/cg_reg_0_msg_0/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_0/cg_reg_0_msg_0/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_0/cg_reg_0_msg_1/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_0/cg_reg_0_msg_1/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_0/cg_reg_0_msg_1/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_0/cg_reg_0_msg_1/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_1/cg_reg_1_layernorm1/gamma:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_1/cg_reg_1_layernorm1/beta:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_1_ffn_dist_dense_0/kernel:0 trainable=True shape=(512, 128) num_weights=65536
INFO:root:layer=cg_reg_1_ffn_dist_dense_0/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_reg_1_ffn_dist_dense_1/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_reg_1_ffn_dist_dense_1/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_reg_1_ffn_dist_dense_2/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_reg_1_ffn_dist_dense_2/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_reg_1/cg_reg_1_msg_0/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_1/cg_reg_1_msg_0/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_1/cg_reg_1_msg_0/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_1/cg_reg_1_msg_0/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_1/cg_reg_1_msg_1/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_1/cg_reg_1_msg_1/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_1/cg_reg_1_msg_1/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_1/cg_reg_1_msg_1/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_2/cg_reg_2_layernorm1/gamma:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_2/cg_reg_2_layernorm1/beta:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_2_ffn_dist_dense_0/kernel:0 trainable=True shape=(512, 128) num_weights=65536
INFO:root:layer=cg_reg_2_ffn_dist_dense_0/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_reg_2_ffn_dist_dense_1/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_reg_2_ffn_dist_dense_1/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_reg_2_ffn_dist_dense_2/kernel:0 trainable=True shape=(128, 128) num_weights=16384
INFO:root:layer=cg_reg_2_ffn_dist_dense_2/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=cg_reg_2/cg_reg_2_msg_0/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_2/cg_reg_2_msg_0/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_2/cg_reg_2_msg_0/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_2/cg_reg_2_msg_0/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_2/cg_reg_2_msg_1/w_t:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_2/cg_reg_2_msg_1/b_t:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=cg_reg_2/cg_reg_2_msg_1/w_h:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_2/cg_reg_2_msg_1/theta:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=cg_reg_0/message_building_layer_lsh_3/lsh_projections:0 trainable=False shape=(128, 100) num_weights=12800
INFO:root:layer=cg_reg_1/message_building_layer_lsh_4/lsh_projections:0 trainable=False shape=(128, 100) num_weights=12800
INFO:root:layer=cg_reg_2/message_building_layer_lsh_5/lsh_projections:0 trainable=False shape=(128, 100) num_weights=12800
INFO:root:layer=output_decoding/output_layernorm/gamma:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=output_decoding/output_layernorm/beta:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=ffn_cls_dense_0/kernel:0 trainable=True shape=(512, 512) num_weights=262144
INFO:root:layer=ffn_cls_dense_0/bias:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=ffn_cls_dense_1/kernel:0 trainable=True shape=(512, 256) num_weights=131072
INFO:root:layer=ffn_cls_dense_1/bias:0 trainable=True shape=(256,) num_weights=256
INFO:root:layer=ffn_cls_dense_2/kernel:0 trainable=True shape=(256, 128) num_weights=32768
INFO:root:layer=ffn_cls_dense_2/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=ffn_cls_dense_3/kernel:0 trainable=True shape=(128, 8) num_weights=1024
INFO:root:layer=ffn_cls_dense_3/bias:0 trainable=True shape=(8,) num_weights=8
INFO:root:layer=ffn_charge_dense_0/kernel:0 trainable=True shape=(512, 256) num_weights=131072
INFO:root:layer=ffn_charge_dense_0/bias:0 trainable=True shape=(256,) num_weights=256
INFO:root:layer=ffn_charge_dense_1/kernel:0 trainable=True shape=(256, 128) num_weights=32768
INFO:root:layer=ffn_charge_dense_1/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=ffn_charge_dense_2/kernel:0 trainable=True shape=(128, 3) num_weights=384
INFO:root:layer=ffn_charge_dense_2/bias:0 trainable=True shape=(3,) num_weights=3
INFO:root:layer=ffn_pt_dense_0/kernel:0 trainable=True shape=(1040, 512) num_weights=532480
INFO:root:layer=ffn_pt_dense_0/bias:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=ffn_pt_dense_1/kernel:0 trainable=True shape=(512, 256) num_weights=131072
INFO:root:layer=ffn_pt_dense_1/bias:0 trainable=True shape=(256,) num_weights=256
INFO:root:layer=ffn_pt_dense_2/kernel:0 trainable=True shape=(256, 2) num_weights=512
INFO:root:layer=ffn_pt_dense_2/bias:0 trainable=True shape=(2,) num_weights=2
INFO:root:layer=ffn_eta_dense_0/kernel:0 trainable=True shape=(520, 256) num_weights=133120
INFO:root:layer=ffn_eta_dense_0/bias:0 trainable=True shape=(256,) num_weights=256
INFO:root:layer=ffn_eta_dense_1/kernel:0 trainable=True shape=(256, 128) num_weights=32768
INFO:root:layer=ffn_eta_dense_1/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=ffn_eta_dense_2/kernel:0 trainable=True shape=(128, 1) num_weights=128
INFO:root:layer=ffn_eta_dense_2/bias:0 trainable=True shape=(1,) num_weights=1
INFO:root:layer=ffn_phi_dense_0/kernel:0 trainable=True shape=(520, 256) num_weights=133120
INFO:root:layer=ffn_phi_dense_0/bias:0 trainable=True shape=(256,) num_weights=256
INFO:root:layer=ffn_phi_dense_1/kernel:0 trainable=True shape=(256, 128) num_weights=32768
INFO:root:layer=ffn_phi_dense_1/bias:0 trainable=True shape=(128,) num_weights=128
INFO:root:layer=ffn_phi_dense_2/kernel:0 trainable=True shape=(128, 2) num_weights=256
INFO:root:layer=ffn_phi_dense_2/bias:0 trainable=True shape=(2,) num_weights=2
INFO:root:layer=ffn_energy_dense_0/kernel:0 trainable=True shape=(1040, 512) num_weights=532480
INFO:root:layer=ffn_energy_dense_0/bias:0 trainable=True shape=(512,) num_weights=512
INFO:root:layer=ffn_energy_dense_1/kernel:0 trainable=True shape=(512, 256) num_weights=131072
INFO:root:layer=ffn_energy_dense_1/bias:0 trainable=True shape=(256,) num_weights=256
INFO:root:layer=ffn_energy_dense_2/kernel:0 trainable=True shape=(256, 1) num_weights=256
INFO:root:layer=ffn_energy_dense_2/bias:0 trainable=True shape=(1,) num_weights=1
INFO:root:compiling model
Model: "pf_net_dense"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 node_encoding (Sequential)  (1, None, 512)            297472    
                                                                 
 input_encoding_cms (InputE  multiple                  0         
 ncodingCMS)                                                     
                                                                 
 cg_id_0 (CombinedGraphLaye  multiple                  1686400   
 r)                                                              
                                                                 
 cg_id_1 (CombinedGraphLaye  multiple                  1686400   
 r)                                                              
                                                                 
 cg_id_2 (CombinedGraphLaye  multiple                  1686400   
 r)                                                              
                                                                 
 cg_reg_0 (CombinedGraphLay  multiple                  1686400   
 er)                                                             
                                                                 
 cg_reg_1 (CombinedGraphLay  multiple                  1686400   
 er)                                                             
                                                                 
 cg_reg_2 (CombinedGraphLay  multiple                  1686400   
 er)                                                             
                                                                 
 output_decoding (OutputDec  multiple                  2255889   
 oding)                                                          
                                                                 
=================================================================
Total params: 12671761 (48.34 MB)
Trainable params: 12594961 (48.05 MB)
Non-trainable params: 76800 (300.00 KB)
_________________________________________________________________
COMET WARNING: tensorflow datasets are not currently supported for gradient and activation auto-logging
COMET INFO: Ignoring automatic log_parameter('verbose') because 'keras:verbose' is in COMET_LOGGING_PARAMETERS_IGNORE
2023-10-11 14:06:36.014822: W tensorflow/core/framework/dataset.cc:956] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
Epoch 1/50
2023-10-11 14:06:55.708300: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:606] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-10-11 14:06:56.367155: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8600
 5909/74462 [=>............................] - ETA: 6:53:01 - loss: 1926.1199 - charge_loss: 0.0194 - cls_loss: 0.0115 - cos_phi_loss: 5.6575e-04 - energy_loss: 33.4310 - eta_loss: 7.4119e-04 - pt_loss: 1891.5264 - sin_phi_loss: 5.8584e-04 - learning_rate: 1.0000e-04

* fixes for pytorch, CMS t1tttt dataset, update response plots

jpata and others added 8 commits October 10, 2023 10:45

up

e03346d

up

dec7bb8

up

530bf91

up

08ed34c

up

6b8ce54

up

55e2c18

up

e74785c

up

c8bbfc2

jpata changed the title ~~fixes for pytorch, CMS t1tttt dataset~~ fixes for pytorch, CMS t1tttt dataset, update response plots Oct 11, 2023

jpata merged commit 59b5d97 into main Oct 11, 2023
10 checks passed

jpata deleted the t1tttt branch October 25, 2023 11:56

farakiko pushed a commit to farakiko/particleflow that referenced this pull request Jan 23, 2024

fixes for pytorch, CMS t1tttt dataset, update response plots (jpata#232)

233a789

* fixes for pytorch, CMS t1tttt dataset, update response plots

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixes for pytorch, CMS t1tttt dataset, update response plots #232

fixes for pytorch, CMS t1tttt dataset, update response plots #232

jpata commented Oct 11, 2023 •

edited

Loading

fixes for pytorch, CMS t1tttt dataset, update response plots #232

fixes for pytorch, CMS t1tttt dataset, update response plots #232

Conversation

jpata commented Oct 11, 2023 • edited Loading

jpata commented Oct 11, 2023 •

edited

Loading