Skip to content

NDCMS/NDParticleML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NDParticleML

This code seeks to train neural networks to learn likelihood functions (LFs) in the Standard Model effective field theory framework calculated from simulated yields parameterized by Wilson Coefficients and observed yields in the CMS detector.

Overview

The workflow consists of three steps: training, validation, and analysis. Training and validation are completed (although still being improved), and analysis is still in its early stages.


Training: During this step, a neural network (NN) is trained on a sampling of the LF from the LHC CMS experiment to approximate the LF with sufficient accuracy.

  • Requirements
    • A combine sampling of the LF
  • Products
    • A trained NN as a .pt file

Validation: During this step, the trained NN is tested against correct data—combine scans—to analyze its accuracy.

  • Requirements
    • combine low-dimentional scans of the LF
    • A trained NN
  • Products
    • Comparison graphs between combine scans and NN scans

Analysis: During this step, we take the trained NN as the LF and explore the 16D parameter space, taking advantage of the speedup over combine samplings.

  • Requirements
    • A trained NN
  • Products
    • Scans over linear combinations of WCs
    • TBD

Getting Started

This section will show you how to train and validate a NN in our framework. Warning: For non-NDCMS members, data files need to be obtained via alternative means.

Set up the environment

For NDCMS members, this Google Doc is a good reference for setting up CRC and the CAML GPU cluster. Key steps:

For general use:

  • Make sure to run everything on CUDA.
  • Have Pytorch installed, version >=1.9.

Train an example NN

Via batch system:

  • Copy the following into your working directory
    • ./archive/v1/training/likelihood.py
    • ./archive/v1/training/likelihood.sh
    • ./archive/v1/training/likelihood.submit
    • ./archive/v1/modules/nn_module_v1.py
    • likelihood_data_processed.npz on CurateND
  • Check if the import statement has the right nn_module name
  • Run condor_submit likelihood.submit
  • After finished, there will be graphs and the trained model in their respective folder.

Via Jupyter Notebook:

  • Move the contents of ./archive/v1/training/likelihood.py into a notebook
  • Near the bottom, change how the graphs and model are saved. For example, f'./graphs/{args.out_file}.pdf' becomes f'{args.out_file}.pdf'.
  • Copy the following into your working directory
    • The Jupyter Notebook
    • ./archive/v1/modules/nn_module_v1.py
    • likelihood_data_processed.npz on CurateND
  • Check if import has the right nn_module name
  • Run the notebook
  • After finished, there will be graphs and the trained model in your working directory.

Validate an example NN

  • Copy the following into the same directory as the xxxx_model+.pt file
    • ./archive/v1/validation/Validation.ipynb
    • All the likelihood_xxx.npz files
    • nn_module_v1.py
  • Run Validation.ipynb
  • Graphs should be saved to the same directory

Going Beyond Old Code

Above was the state of the project at the end of summer 2021. Since then, a lot has happened, but the basic structure remains the same. Go into the training and validation folders to learn more about how to execute newer code. archive is a self-contained folder with everything needed to reproduce the project at the end of summer 2021. Outside of archive, everything is in active development.

Here is a brief overview of each folder:

  • demos: Minimal runnable code that captures essential ideas
  • models: Files that contain trained NNs, possibly along with validation graphs
  • modules: Python modules that need to be imported for every code run in this repository
  • tools: Handy scripts for a variety of tasks of tangential importance to the project
  • training: Code for training NNs
  • validation: Code for validating NNs

Additional Notes:

  • Folders named nb_code contain the raw code of the Jupyter notebooks in the same directory. This is to keep track of meaningful changes in the notebooks. Therefore, please update the raw code every time a notebook is modified by saving the notebook as a .py file.
  • Data for the validation and analysis graphs, including the 1D and 2D combine scans, are in CurateND.

TODOs

  • Make sure all validation codes are compatible with the changes associated with compare_plots.
  • Training
    • early stopping
    • Try using loss instead of accuracy to select models. Not sure if this will be good, but it seems like the loss keeps decreasing while the accuracy bottoms out.
    • Sample the LF automatically, i.e. automatically oversampling regions with high LF.
    • Possibly outdated: try using np.triu_indices to compute the quadratic WC terms.
    • Possibly not there yet: Try DNN pruning.
  • Validation
    • Use random minimatches for profiling. See the TODO in the profile function in nn_module.
  • Analysis
    • Fit to a hyperellipse, i.e. find the flattest direction, then find the flattest direction in the space orthogonal to the first direction, and so on.
    • Talk to theorists to find more applications of linear combination WC scans.
    • Explore the topology in higher-dimensional spaces. Start with Parker's notebook /analysis/3d_mapper.ipynb.
  • General
    • Just for organization, save the raw data for graphs using multiindex dataframes. Currently everything is saved in dictionaries of dictionaries.
  • Papers for onboarding
  • Papers to study for future directions
  • See #TODOs scattered around the repository.