Skip to content
forked from rkansal47/HHbbVV

Code for HH->bbVV all-hadronic channel analysis.

License

Notifications You must be signed in to change notification settings

zichunhao/HHbbVV

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HHbbVV

Codestyle pre-commit.ci status

Search for two boosted (high transverse momentum) Higgs bosons (H) decaying to two beauty quarks (b) and two vector bosons (V). The majority of the analysis uses a columnar framework to process input tree-based NanoAOD files using the coffea and scikit-hep Python libraries.

Setting up package

Creating a virtual environment

First, create a virtual environment (mamba is recommended):

# Download the mamba setup script (change if needed for your machine https://github.com/conda-forge/miniforge#mambaforge)
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
# Install: (the mamba directory can end up taking O(1-10GB) so make sure the directory you're using allows that quota)
chmod u+x ./Mambaforge-Linux-x86_64.sh  # executable permission
./Mambaforge-Linux-x86_64.sh  # follow instructions in the installation
# Clone the repository
git clone https://github.com/rkansal47/HHbbVV.git
cd HHbbVV
# make the environment
mamba env create -f environment.yml
mamba activate bbVV

Installing package

# From inside the HHbbVV repository
# Perform an editable installation
pip install -e .
# for committing to the repository
pre-commit install

Troubleshooting

  • If your default python in your environment is not Python 3, make sure to use pip3 and python3 commands instead.

  • You may also need to upgrade pip to perform the editable installation:

python3 -m pip install -e .

Instructions for running coffea processors

General note: Coffea-casa is faster and more convenient, however still somewhat experimental so for large of inputs and/or processors which may require heavier cpu/memory usage (e.g. bbVVSkimmer) condor is recommended.

  1. after following instructions in the link ^ set up an account, open the coffea-casa GUI (https://cmsaf-jh.unl.edu) and create an image
  2. open src/runCoffeaCasa.ipynb
  3. import your desired processor, specify it in the run_uproot_job function, and specify your filelist
  4. run the first three cells

Condor

Setup

To submit to condor, all you need is python >= 3.7.

For testing locally, it is recommended to use miniconda/mamba (mamba is way faster!):

# Download the setup bash file from here https://github.com/conda-forge/miniforge#mambaforge
# e.g. wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
# Install: (the mamba directory can end up taking O(1-10GB) so make sure the directory you're using allows that quota)
./Mambaforge-Linux-x86_64.sh  # follow instructions in the installation
mamba create -n bbVV python=3.9
mamba activate bbVV
pip install coffea "tritonclient[all]" pyyaml
mamba install -c conda-forge xrootd=5.4.0  # need openssl v1.1 for lxplus and UCSD t2, hence pinning xrootd version.

Manually splits up the files into condor jobs.

git clone https://github.com/rkansal47/HHbbVV/
cd HHbbVV
TAG=Aug18_skimmer
# will need python3 (recommended to set up via miniconda)
python src/condor/submit.py --processor skimmer --tag $TAG --files-per-job 20
for i in condor/$TAG/*.jdl; do condor_submit $i; done

Alternatively, can be submitted from a yaml file:

python src/condor/submit_from_yaml.py --year 2017 --processor skimmer --tag $TAG --yaml src/condor/submit_configs/skimmer_inputs_07_24.yaml

To test locally first (recommended), can do e.g.:

mkdir outfiles
python -W ignore src/run.py --starti 0 --endi 1 --year 2017 --processor skimmer --executor iterative --samples HWW --subsamples GluGluToHHTobbVV_node_cHHH1_pn4q

TODO: instructions for lpcjobqueue (currently quite buggy)

Processors

JetHTTriggerEfficiencies

Applies a muon pre-selection and accumulates 4D ([Txbb, Th4q, pT, mSD]) yields before and after our triggers.

To test locally:

python -W ignore src/run.py --year 2018 --processor trigger --sample SingleMu2018 --subsamples SingleMuon_Run2018B --starti 0 --endi 1

And to submit all:

nohup bash -c 'for i in 2016 2016APV 2017 2018; do python src/condor/submit.py --year $i --tag '"${TAG}"' --processor trigger --submit; done' &> tmp/submitout.txt &

bbVVSkimmer

Applies pre-selection cuts, runs inference with our new HVV tagger, and saves unbinned branches as parquet files.

Parquet and pickle files will be saved in the eos directory of specified user at path ~/eos/bbVV/skimmer/<tag>/<sample_name>/<parquet or pickles>. Pickles are in the format {'nevents': int, 'cutflow': Dict[str, int]}.

To test locally:

python -W ignore src/run.py --processor skimmer --year 2017 --samples HH --subsamples GluGluToHHTobbVV_node_cHHH1 --save-systematics --starti 0 --endi 1

or use the src/HHbbVV/bash/run_local.sh to run over files from different processes.

Or on a specific file(s):

python -W ignore src/run.py --processor skimmer --year 2017 --files $FILE --files-name GluGluToHHTobbVV_node_cHHH1

Jobs

nohup python src/condor/submit_from_yaml.py --year 2018 --tag $TAG --processor skimmer --git-branch main --submit --yaml src/condor/submit_configs/skimmer_inputs_24_02_26.yaml &> tmp/submitout.txt &

All years:

nohup bash -c 'for year in 2016APV 2016 2017 2018; do python src/condor/submit_from_yaml.py --year $year --tag '"${TAG}"' --processor skimmer --save-systematics --submit --yaml src/condor/submit_configs/skimmer_inputs_23_02_17.yaml; done' &> tmp/submitout.txt &

To Submit (if not using the --submit flag)

nohup bash -c 'for i in condor/'"${TAG}"'/*.jdl; do condor_submit $i; done' &> tmp/submitout.txt &

Or just signal:

python src/condor/submit.py --year 2017 --tag $TAG --samples HH --subsamples GluGluToHHTobbVV_node_cHHH1 --processor skimmer --submit

TaggerInputSkimmer

Applies a loose pre-selection cut, saves ntuples with training inputs.

To test locally:

python -W ignore src/run.py --year 2017 --starti 300 --endi 301 --samples HWWPrivate --subsamples jhu_HHbbWW --processor input --label AK15_H_VV
python -W ignore src/run.py --year 2017 --starti 300 --endi 301 --samples QCD --subsamples QCD_Pt_1000to1400 --processor input --label AK15_QCD --njets 1 --maxchunks 1

Jobs:

python src/condor/submit_from_yaml.py --year 2017 --tag $TAG --processor input --save-ak15 --yaml src/condor/submit_configs/training_inputs_07_21.yaml
python src/condor/submit_from_yaml.py --year 2017 --tag $TAG --processor input --yaml src/condor/submit_configs/training_inputs_09_16.yaml --jet AK8

To submit add --submit flag.

TTScaleFactorsSkimmer

Applies cuts for a semi-leptonic ttbar control region, as defined for the JMAR W SF and CASE Lund Plane SF measurements to validate Lund plane scale factors.

Lund plane scale factors are calculated for top-matched jets in semi-leptonic ttbar events.

To test locally:

python -W ignore src/run.py --year 2018 --processor ttsfs --sample TTbar --subsamples TTToSemiLeptonic --starti 0 --endi 1

Jobs:

nohup python src/condor/submit_from_yaml.py --year 2018 --tag $TAG --processor ttsfs --submit --yaml src/condor/submit_configs/ttsfs_inputs_12_4.yaml &> submitout.txt &

Or to submit only the signal:

python src/condor/submit.py --year 2018 --tag $TAG --sample TTbar --subsamples TTToSemiLeptonic --processor ttsfs --submit

Condor Scripts

Check jobs

Check that all jobs completed by going through output files:

for year in 2016APV 2016 2017 2018; do python src/condor/check_jobs.py --tag $TAG --processor trigger (--submit) --year $year; done

nohup version:

(Do condor_q | awk '{ print $9}' | grep -o '[^ ]*\.sh' > running_jobs.txt first to get a list of jobs which are running.)

nohup bash -c 'for year in 2016APV 2016 2017 2018; do python src/condor/check_jobs.py --year $year --tag '"${TAG}"' --processor skimmer --submit --check-running; done' &> tmp/submitout.txt &

Combine pickles

Combine all output pickles into one:

for year in 2016APV 2016 2017 2018; do python src/condor/combine_pickles.py --tag $TAG --processor trigger --r --year $year; done

Post Processing

In `src/HHbbVV/postprocessing':

BDT Pre-Processing

python BDTPreProcessing.py --data-dir "../../../../data/skimmer/Feb24/" --signal-data-dir "../../../../data/skimmer/Jun10/" --plot-dir "../../../plots/BDTPreProcessing/$TAG/" --year "2017" --bdt-data (--control-plots)

BDT Trainings

python TrainBDT.py --model-dir testBDT --data-path "../../../../data/skimmer/Feb24/bdt_data" --year "all" (--test)

Inference-only:

python TrainBDT.py  --data-path "../../../../data/skimmer/Feb24/bdt_data" --year "all" --inference-only --model-dir "../../../../data/skimmer/Feb24/23_05_12_multiclass_rem_feats_3"

Post-Processing

python postprocessing.py --templates --year "2017" --template-dir "templates/$TAG/" --plot-dir "../../../plots/PostProcessing/$TAG/" --data-dir "../../../../data/skimmer/Feb24/" (--resonant --signal-data-dir "" --control-plots)

All years (non-resonant):

for year in 2016 2016APV 2017 2018; do python -u postprocessing.py --templates --year $year --template-dir "templates/Jun14" --data-dir "../../../../data/skimmer/Feb24" --signal-data-dir "../../../../data/skimmer/Jun10" --bdt-preds-dir "../../../../data/skimmer/Feb24/23_05_12_multiclass_rem_feats_3/inferences"; done

Scan (non-resonant):

for year in 2016 2016APV 2017 2018; do python -u postprocessing.py --templates --year $year --template-dir "templates/$TAG/" --data-dir "../../../../data/skimmer/Feb24/" --old-processor --nonres-txbb-wp "LP" "MP" "HP" --nonres-bdt-wp 0.995 0.998 0.999 --no-do-jshifts; done

Control plots with resonant and nonresonant samples

Run postprocessing/bash_scripts/ControlPlots.sh from inside postprocessing folder.

Important: If running on a Mac, make sure to install gnu-getopt first, see here.

BDT sculpting plots

Run postprocessing/bash_scripts/BDTPlots.sh from inside postprocessing folder.

Making separate background and signal templates for scan and bias tests (resonant)

nohup bash_scripts/res_tagger_scan.sh $TAG &> scan.txt &
nohup bash_scripts/res_pt_scan.sh $TAG &> scan.txt &
nohup bash_scripts/res_bias_templates.sh $TAG &> bias.txt &

Remember to check output to make sure all years' templates are made!!

Create Datacard

Need root==6.22.6 and https://github.com/rkansal47/rhalphalib installed (pip install -e . --user after cloning the repo).

python3 postprocessing/CreateDatacard.py --templates-dir templates/$TAG --model-name $TAG (--resonant)

Or from separate templates for background and signal:

python3 -u postprocessing/CreateDatacard.py --templates-dir "/eos/uscms/store/user/rkansal/bbVV/templates/23Apr30Scan/txbb_HP_thww_0.96" \
--sig-separate --resonant --model-name $sample --sig-sample $sample

Scan (non-resonant):

for txbb_wp in "LP" "MP" "HP"; do for bdt_wp in 0.994 0.99 0.96 0.9 0.8 0.6 0.4; do python3 -u postprocessing/CreateDatacard.py --templates-dir templates/23May13NonresScan/txbb_${txbb_wp}_bdt_${bdt_wp} --model-name 23May13NonresScan/txbb_${txbb_wp}_bdt_${bdt_wp} --no-do-jshifts --nTF 0; done; done

Datacards with different orders of TFs for F-tests:

Use the src/HHbbVV/combine/F_test_res.sh script.

Datacards (and background-only fits) for bias tests:

src/HHbbVV/combine/biastests.sh --cardstag $cardstag --templatestag $templatestag

PlotFits

python PlotFits.py --fit-file "cards/test_tied_stats/fitDiagnosticsBlindedBkgOnly.root" --plots-dir "../../../plots/PostFit/09_02/"

Combine

CMSSW + Combine Quickstart

cmsrel CMSSW_11_2_0
cd CMSSW_11_2_0/src
cmsenv
git clone -b py3 https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
git clone -b v2.0.0 https://github.com/cms-analysis/CombineHarvester.git CombineHarvester
scramv1 b clean; scramv1 b

New version:

cmsrel CMSSW_11_3_4
cd CMSSW_11_3_4/src
cmsenv
# need my fork until regex for float parameters is merged into the main repo
git clone -b regex-float-parameters https://github.com/rkansal47/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
git clone -b v2.0.0 https://github.com/cms-analysis/CombineHarvester.git CombineHarvester
# Important: this scram has to be run from src dir
scramv1 b clean; scramv1 b

I also add this to my .bashrc for convenience:

export PATH="$PATH:/uscms_data/d1/rkansal/HHbbVV/src/HHbbVV/combine"

csubmit() {
    local file=$1; shift;
    python3 "/uscms_data/d1/rkansal/HHbbVV/src/HHbbVV/combine/submit/submit_${file}.py" "$@"
}

Run fits and diagnostics locally

All via the below script, with a bunch of options (see script):

run_blinded.sh --workspace --bfit --limits

F-tests locally for non-resonant

This will take 5-10 minutes.

# automatically make workspaces and do the background-only fit for orders 0 - 3
run_ftest_nonres.sh --cardstag 23May14 --templatestag $templatestag  # -dl for saving shapes and limits
# run f-test for desired order
run_ftest_nonres.sh --cardstag 23May14 --goftoys --ffits --numtoys 100 --seed 444 --order 0

Condor is needed for resonant, see below.

Run fits on condor

Making datacards

Can run over all the resonant signals (default) or scan working points for a subset of signals (--scan)

csubmit cards --test --scan --resonant --templates-dir 23Apr30Scan

F-tests

Generate toys and fits for F-tests (after making cards and b-only fits for the testing order AND testing order + 1!)

csubmit f_test --tag 23May2 --cards-tag 23May2 --low1 0 --low2 0

Impacts

csubmit impacts --tag 23May2 (--local [if you want to run them locally])

This was also output a script to collect all the impacts after the jobs finish.

Signal injection tests

For resonant, use scripts inside the src/HHbbVV/combine/ directory and run from one level above the sample datacard folders (e.g. /uscms/home/rkansal/hhcombine/cards/biastests/23Jul17ResClipTFScale1).

Setting up the datacards, assuming templates have already been made (see templates section), e.g.:

setup_bias.sh -r --scale 1 --cardstag 23Sep14_hww0.6_pt_400_350 --templatestag 23Sep14_thww0.6_pt_400_350

Submitting 1000 toys for each sample and r value + more toys for samples with high fit failure rates:

submit_bias_res_loop.sh $seed $TAG

Moving the outputs to a bias/$TAG dir after (uses the last digit of the seed - so make sure different runs use different last digits!):

mv_bias_outputs.sh [last-digit-of-seed] $TAG

Misc

getopt for Mac

To run bash scripts with the getopt command (e.g. run_blinded.sh, ControlPlots.sh) on Macs:

# install gnu-getopt with Homebrew
brew install gnu-getopt
# add to path
sudo echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.zsh_profile
source ~/.zsh_profile

From https://stackoverflow.com/questions/12152077/how-can-i-make-bash-deal-with-long-param-using-getopt-command-in-mac.

Command for copying directories to PRP in background

(krsync) (you may need to install rsync in the PRP pod first if it's been restarted)

cd ~/eos/bbVV/input/${TAG}_${JETS}/2017
mkdir ../copy_logs
for i in *; do echo $i && sleep 3 && (nohup sh -c "krsync -av --progress --stats $i/root/ $HWWTAGGERDEP_POD:/hwwtaggervol/training/$FOLDER/$i" &> ../copy_logs/$i.txt &) done

Command for copying res samples to my laptop

for sample in 'NMSSM_XToYHTo2W2BTo4Q2B_MX-900_MY-80' 'NMSSM_XToYHTo2W2BTo4Q2B_MX-1200_MY-190' 'NMSSM_XToYHTo2W2BTo4Q2B_MX-2000_MY-125' 'NMSSM_XToYHTo2W2BTo4Q2B_MX-3000_MY-250' 'NMSSM_XToYHTo2W2BTo4Q2B_MX-4000_MY-150'; do for year in 2016APV 2016 2017 2018; do rsync -avP rkansal@cmslpc-sl7.fnal.gov:eos/bbVV/skimmer/Apr11/$year/$sample data/skimmer/Apr11/$year/; done; done

Get all running condor job names:

condor_q | awk '{ print $9}' | grep -o '[^ ]*\.sh'

Crab data jobs recovery

Kill each task:

dataset=SingleMuon

for i in {B..F}; do crab kill -d crab/pfnano_v2_3/crab_pfnano_v2_3_2016_${dataset}_Run2016${i}*HIPM; done
for i in {F..H}; do crab kill -d crab/pfnano_v2_3/crab_pfnano_v2_3_2016_${dataset}_Run2016$i; done
for i in {B..F}; do crab kill -d crab/pfnano_v2_3/crab_pfnano_v2_3_2017_${dataset}_Run2017$i; done
for i in {A..D}; do crab kill -d crab/pfnano_v2_3/crab_pfnano_v2_3_2018_${dataset}_Run2018$i; done

Get a crab report for each task:

for i in {B..F}; do crab report -d crab/pfnano_v2_3/crab_pfnano_v2_3_2016_${dataset}_Run2016${i}*HIPM; done
for i in {F..H}; do crab report -d crab/pfnano_v2_3/crab_pfnano_v2_3_2016_${dataset}_Run2016$i; done
for i in {B..F}; do crab report -d crab/pfnano_v2_3/crab_pfnano_v2_3_2017_${dataset}_Run2017$i; done
for i in {A..D}; do crab report -d crab/pfnano_v2_3/crab_pfnano_v2_3_2018_${dataset}_Run2018$i; done

Combine the lumis to process:

mkdir -p recovery/$dataset/
# shopt extglob
# jq -s 'reduce .[] as $item ({}; . * $item)' crab/pfnano_v2_3/crab_pfnano_v2_3_2016_${dataset}_*HIPM/results/notFinishedLumis.json > recovery/$dataset/2016APVNotFinishedLumis.json
for year in 2016 2017 2018; do jq -s 'reduce .[] as $item ({}; . * $item)' crab/pfnano_v2_3/crab_pfnano_v2_3_${year}_${dataset}_*/results/notFinishedLumis.json > recovery/$dataset/${year}NotFinishedLumis.json; done

Finally, add these as a lumimask for the recovery task.

About

Code for HH->bbVV all-hadronic channel analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 93.2%
  • Python 6.1%
  • Other 0.7%