Skip to content

Commit

Permalink
CMSSW documentation (#319)
Browse files Browse the repository at this point in the history
* Update README.md

* Update README.md

* Update README.md

* document running script

* md5sum

* up

* clean
  • Loading branch information
jpata authored May 16, 2024
1 parent 53bfd54 commit 43c2651
Show file tree
Hide file tree
Showing 6 changed files with 151 additions and 59 deletions.
89 changes: 87 additions & 2 deletions mlpf/data_cms/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,95 @@
## Code setup

The following should work on lxplus.
```
export IMG=/cvmfs/singularity.opensciencegrid.org/cmssw/cms:rhel7
export SCRAM_ARCH=slc7_amd64_gcc10
#ensure proxy is set
voms-proxy-init -voms cms -valid 192:00
voms-proxy-info
#Initialize SLC7
cmssw-el7
export SCRAM_ARCH=slc7_amd64_gcc10
cmsrel CMSSW_12_3_0_pre6
cd CMSSW_12_3_0_pre6
cmsenv
git cms-init
#checkout the MLPF code
git-cms-merge-topic jpata:pfanalysis_caloparticle
#check out the version from the 2022 release
git checkout 547a0fce7251bfaa6e855aef068f5a45c2d321ec
#compile
scram b -j4
#download the MLPF model
mkdir -p src/RecoParticleFlow/PFProducer/data/mlpf/
wget https://huggingface.co/jpata/particleflow/resolve/main/cms/acat2022_20221004_model40M/dev.onnx?download=true -O src/RecoParticleFlow/PFProducer/data/mlpf/dev.onnx
# must be b786aa6de49b51f703c87533a66326d6
md5sum src/RecoParticleFlow/PFProducer/data/mlpf/dev.onnx
```

## Running MLPF in CMSSW
MLPF is integrated in CMSSW reconstruction and can be run either using simple but slow matrix workflows, or using the faster but more elaborate PF validation.

### Matrix workflows

Matrix workflows allow to run MLPF directly out of the box, rerunning the full reconstruction chain.
This is easy to run, but time consuming.
```
#check the workflows with the .13 suffix (that have MLPF enabled)
> runTheMatrix.py --what upgrade -n | grep "\.13"
#Run this workflow TTbar_14TeV + 2021PU_mlpf
> runTheMatrix.py --what upgrade -l 11834.13
#Takes around 30 minutes
11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED - time date Thu May 16 15:24:47 2024-date Thu May 16 15:06:24 2024; exit: 0 0 0 0
1 1 1 1 tests passed, 0 0 0 0 failed
```

Check the outputs
```
> ls 11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU/*.root
11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU/DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root
11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU/histProbFunction.root
11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU/step1.root
11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU/step2.root
11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU/step3_inDQM.root
11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU/step3_inMINIAODSIM.root
11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU/step3_inNANOEDMAODSIM.root
11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU/step3_inRECOSIM.root
11834.13_TTbar_14TeV+2021PU_mlpf+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU/step3.root
```

The particle flow candidates can be found in `step3.root`:
```
vector<reco::PFCandidate> "particleFlow" "" "RECO"
```

### PF validation
To test MLPF on higher statistics, it's not practical to redo full reconstruction before the particle flow step.
We can follow a similar logic as the PF validation, where only the relevant PF sequences are rerun.

First, the dataset filenames need to be cached:
```
cd src/Validation/RecoParticleFlow/test
python3.9 datasets.py
cat tmp/das_cache/QCD_PU.txt
```

Note: as of May 2024, the dataset `QCD_PU` is only on tape, so the following does not work.
Now, the PF validation workflows can be run using the scripts in
```
cd particleflow
#the number 1 signifies the row index (filename) in the input file to process
./scripts/cmssw/validation_job.sh mlpf $CMSSW_BASE/src/Validation/RecoParticleFlow/test/tmp/das_cache/QCD_PU.txt QCD_PU 1
./scripts/cmssw/validation_job.sh pf $CMSSW_BASE/src/Validation/RecoParticleFlow/test/tmp/das_cache/QCD_PU.txt QCD_PU 1
```

## Generating MLPF training samples
TODO (not generally needed).
56 changes: 56 additions & 0 deletions scripts/cmssw/validation_job.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash
#SBATCH -p main
#SBATCH --mem-per-cpu=7G
#SBATCH --cpus-per-task=1
#SBATCH -o logs/slurm-%x-%j-%N.out

JOBTYPE=$1
INPUT_FILELIST=$2
SAMPLE=$3
NJOB=$4

#change this as needed
OUTDIR=$CMSSW_BASE/out/
WORKDIR=$CMSSW_BASE/work_$SAMPLE_$JOBTYPE_$NJOB

#for T2_EE_Estonia
#OUTDIR=/home/joosep/particleflow/data
#WORKDIR=/scratch/$USER/${SLURM_JOB_ID}

#abort on error, print all commands
set -e
set -x

# source /cvmfs/cms.cern.ch/cmsset_default.sh
# source /cvmfs/grid.cern.ch/c7ui-test/etc/profile.d/setup-c7-ui-example.sh
#
# cd $CMSSW_BASE
#
# eval `scramv1 runtime -sh`

CONDITIONS=auto:phase1_2021_realistic ERA=Run3 GEOM=DB.Extended CUSTOM=
FILENAME=`sed -n "${NJOB}p" $INPUT_FILELIST`
NTHREADS=1

mkdir -p $WORKDIR
cd $WORKDIR

if [ $JOBTYPE == "mlpf" ]; then
cmsDriver.py step3 --conditions $CONDITIONS \
-s RAW2DIGI,L1Reco,RECO,RECOSIM,PAT,VALIDATION:@standardValidation+@miniAODValidation,DQM:@standardDQM+@ExtraHLT+@miniAODDQM+@nanoAODDQM \
--datatier RECOSIM,MINIAODSIM,DQMIO --nThreads 1 -n -1 --era $ERA \
--eventcontent RECOSIM,MINIAODSIM,DQM --geometry=$GEOM \
--filein $FILENAME --fileout file:step3.root --procModifiers mlpf
elif [ $JOBTYPE == "pf" ]; then
cmsDriver.py step3 --conditions $CONDITIONS \
-s RAW2DIGI,L1Reco,RECO,RECOSIM,PAT,VALIDATION:@standardValidation+@miniAODValidation,DQM:@standardDQM+@ExtraHLT+@miniAODDQM+@nanoAODDQM \
--datatier RECOSIM,MINIAODSIM,DQMIO --nThreads 1 -n -1 --era $ERA \
--eventcontent RECOSIM,MINIAODSIM,DQM --geometry=$GEOM \
--filein $FILENAME --fileout file:step3.root
fi
ls *.root

mkdir -p $OUTDIR/$SAMPLE_$JOBTYPE
cp step3_inMINIAODSIM.root $OUTDIR/$SAMPLE_$JOBTYPE/step3_MINI_${NJOB}.root

rm -Rf $WORKDIR
9 changes: 0 additions & 9 deletions scripts/tallinn/submit_validate_cms.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,3 @@ sbatch mlpf/tallinn/validate_cms.sh 43
sbatch mlpf/tallinn/validate_cms.sh 44
sbatch mlpf/tallinn/validate_cms.sh 45
sbatch mlpf/tallinn/validate_cms.sh 46

sbatch mlpf/tallinn/validate_cms_ttbar.sh 1
sbatch mlpf/tallinn/validate_cms_ttbar.sh 2
sbatch mlpf/tallinn/validate_cms_ttbar.sh 3
sbatch mlpf/tallinn/validate_cms_ttbar.sh 4
sbatch mlpf/tallinn/validate_cms_ttbar.sh 5
sbatch mlpf/tallinn/validate_cms_ttbar.sh 6
sbatch mlpf/tallinn/validate_cms_ttbar.sh 7
sbatch mlpf/tallinn/validate_cms_ttbar.sh 8
9 changes: 0 additions & 9 deletions scripts/tallinn/submit_validate_cms_baseline.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,3 @@ sbatch mlpf/tallinn/validate_cms_baseline.sh 43
sbatch mlpf/tallinn/validate_cms_baseline.sh 44
sbatch mlpf/tallinn/validate_cms_baseline.sh 45
sbatch mlpf/tallinn/validate_cms_baseline.sh 46

sbatch mlpf/tallinn/validate_cms_baseline_ttbar.sh 1
sbatch mlpf/tallinn/validate_cms_baseline_ttbar.sh 2
sbatch mlpf/tallinn/validate_cms_baseline_ttbar.sh 3
sbatch mlpf/tallinn/validate_cms_baseline_ttbar.sh 4
sbatch mlpf/tallinn/validate_cms_baseline_ttbar.sh 5
sbatch mlpf/tallinn/validate_cms_baseline_ttbar.sh 6
sbatch mlpf/tallinn/validate_cms_baseline_ttbar.sh 7
sbatch mlpf/tallinn/validate_cms_baseline_ttbar.sh 8
34 changes: 0 additions & 34 deletions scripts/tallinn/validate_cms.sh

This file was deleted.

13 changes: 8 additions & 5 deletions scripts/tallinn/validate_cms_baseline.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,20 @@
#SBATCH -o logs/slurm-%x-%j-%N.out

NJOB=$1
INPUT_FILELIST=/home/joosep/reco/mlpf/CMSSW_12_3_0_pre6/src/Validation/RecoParticleFlow/test/tmp/das_cache/QCD_PU.txt

#change this as needed
OUTDIR=/home/joosep/particleflow/data

INPUT_FILELIST=$CMSSW_BASE/src/Validation/RecoParticleFlow/test/tmp/das_cache/QCD_PU.txt

set -e
set -v
source /cvmfs/cms.cern.ch/cmsset_default.sh
source /cvmfs/grid.cern.ch/c7ui-test/etc/profile.d/setup-c7-ui-example.sh

cd ~/reco/mlpf/CMSSW_12_3_0_pre6
cd $CMSSW_BASE

eval `scramv1 runtime -sh`
export X509_USER_PROXY=/home/joosep/x509

CONDITIONS=auto:phase1_2021_realistic ERA=Run3 GEOM=DB.Extended CUSTOM=
FILENAME=`sed -n "${NJOB}p" $INPUT_FILELIST`
Expand All @@ -28,7 +31,7 @@ cd $WORKDIR
cmsDriver.py step3 --conditions $CONDITIONS -s RAW2DIGI,L1Reco,RECO,RECOSIM,PAT,VALIDATION:@standardValidation+@miniAODValidation,DQM:@standardDQM+@ExtraHLT+@miniAODDQM+@nanoAODDQM --datatier RECOSIM,MINIAODSIM,DQMIO --nThreads 1 -n -1 --era $ERA --eventcontent RECOSIM,MINIAODSIM,DQM --geometry=$GEOM --filein $FILENAME --fileout file:step3.root
ls *.root

mkdir -p /home/joosep/particleflow/data/QCDPU_baseline/
cp step3_inMINIAODSIM.root /home/joosep/particleflow/data/QCDPU_baseline/step3_MINI_${NJOB}.root
mkdir -p $OUTDIR/QCDPU_baseline/
cp step3_inMINIAODSIM.root $OUTDIR/QCDPU_baseline/step3_MINI_${NJOB}.root

rm -Rf $WORKDIR

0 comments on commit 43c2651

Please sign in to comment.