User Tools

Site Tools


Sidebar

2015:groups:higgs:dmhiggs:monoh

This is an old revision of the document!


Chair: Nicola De Filippis

Members: Veronica Sanz, Harrison Prosper, Sabine Kraml, Daniel Schmeier, Sezen Sekmen, Giacomo Polesello, Jory Sonneveld, Nishita Desai, Camillo Carillo, Sanjoy Biswas, Monika Wielers, Ilaria Brivio, Jose Miguel No

Task List

The goal of the study we started in LesHouches is:

  • derive sensitivity of monoHiggs analyses to probe DM mass hypotheses for different models, at 13 TeV
  • compare that with what obtained by H→invisible particles analysis and other mono-X analyses

Below there is a possible list of items to be addressed in preparation for the Les Houches proceedings. Please add more or correct my list if needed:

  • finalize the list of models to quote and compare; prepare UFO and lhe events for all of them and store in a place accessible by everybody. Documentation about those models has to be also prepared.
  • finalize the list of background samples to be used and those not yet produced. We need to be sure about cross section. For example in the case of 4l Sanjoy pointed that we need to produce ZH with Z→ll and H→ZZ→2l2nu.
  • we need to run on full statistics of signal and background samples.
  • we should have already a working recipe for delphes fast simulation; if not we need to fix that. We need to verify the values of the efficiency of muons/electron and photons and the fake rate if any, by comparing with current CMS/ATLAS values for the 4l and gamma gamma analyses. We need to verify the ATLAS-like approach with the smearing for photons and check the MET.
  • we need to check if we can include also H→WW channel !
  • we have a preliminary implementation of the 4l and gamma gamma analysis in delphes but we need to check if the efficiency is correct and there are no bugs. …eventually we need to change and optimize cuts for 13 TeV scenario.
  • we do prepare a list of plots to be produced (like outflow, 4l, -photon invariant mass —signal + background, etc..)
  • we need to configure and run the tool for limits and include systematics in the statistical treatment.
  • produce plots about the upper limit on the cross section and compare different models → turn that in limits on couplings or whatever.
  • compare sensitivity with H→invisible particle analysis.

References

Data

Parton Events

The lhe files for the different models are here:

hhxx
Zprime
Higgs Scalar
Background for 4l final state:

ZZto2e2mu: (xsection=0.29084336591184917 pb)
https://cernbox.cern.ch/public.php?service=files&t=43cc4803a552ebe0188df8fd8a655672

ZZto4e: (xsection=0.13205154366855495 pb)
https://cernbox.cern.ch/public.php?service=files&t=fc2b0c901ecae46f25ee46fde49e6e4d

ZZto4mu: (xsection=0.13205154366855495 pb)
https://cernbox.cern.ch/public.php?service=files&t=17bd6e6956b86d475f1fa314f1d617f8

ZH, H→ZZ→4l: (xsection=2.62619200000000028e-04 pb)
https://cernbox.cern.ch/public.php?service=files&t=2d4ae1b2be68a7181f904e7dddd3e050

ttH, H→ZZ→4l: (xsection=1.53566999999999988e-04 pb)
https://cernbox.cern.ch/public.php?service=files&t=8aaba0f2678ef5d48cbf6d737f386ef6

WminusH, H→ZZ→4l:
https://cernbox.cern.ch/public.php?service=files&t=93f1c85817d3c06a90ef991ed0ba874d

WplusH, H→ZZ→4l: (xsection WplusH+WminusH= 4.16760000000000015e-04 pb)
https://cernbox.cern.ch/public.php?service=files&t=f3fbf337f9bde773f45436562904b825

ggH, H→ZZ→4l: (xsection=1.32638400000000007e-02 pb)
https://www.dropbox.com/sh/wauvsr4evou4zkt/AAB8z3YKN8ZMVvzMIBH3i7fua?dl=0

VBF, H→ZZ→4l: (xesction=0.001131896 pb )
https://cernbox.cern.ch/public.php?service=files&t=114cec87be1d3f480a937722b85fbca2

Help Needed for gamma gamma with this part, new background

Showered, Reconstructed Events

Here are the .root files for some of the files above (more to be added). The files in the directory itself are created using Standard Delphes 3.2.0 and with the PileUp events from below. Electrons, Muons and Photons are not isolated in this sample! Track and Tower information is available in the ROOT file to apply an external isolation routine. The used detector card can be found under “Auxiliary”.
Each signal .root file contains 1000 events, background samples contain 5000 events.

Delphes PileUp Events

Cross Sections

Cross sections x BR(H→ZZ→4l) (in pb) (M_zprime=1TeV, M_scalar=1TeV)
Model File sigma x Br (pb)
Higgs_hhxx_scalar_nohdecay_1GeV_13TeV 1.065e-11
Higgs_hhxx_scalar_nohdecay_10GeV_13TeV 1.0567e-11
Higgs_hhxx_scalar_nohdecay_100GeV_13TeV 2.2276e-17
Higgs_hhxx_scalar_nohdecay_500GeV_13TeV 5.789e-20
Higgs_hhxx_scalar_nohdecay_1000GeV_13TeV 1.333e-21
Higgs_Zprime_nohdecay_1GeV_13TeV 0.00000636732
Higgs_Zprime_nohdecay_10GeV_13TeV 0.0000063590400
Higgs_Zprime_nohdecay_100GeV_13TeV 0.0000002264028
Higgs_Zprime_nohdecay_500GeV_13TeV 0.00000001475772
Higgs_Zprime_nohdecay_1000GeV_13TeV 0.00000000003287160
Higgs_scalar_nohdecay_1GeV_13TeV 0.000849144
Higgs_scalar_nohdecay_10GeV_13TeV 0.000809598
Higgs_scalar_nohdecay_100GeV_13TeV 0.00000002508564
Higgs_scalar_nohdecay_500GeV_13TeV 0.0000000001063428
Higgs_scalar_nohdecay_1000GeV_13TeV 0.000000000000027086640
Cross sections x BR(H→gg) (in pb) (M_zprime=1TeV, M_scalar=1TeV):
Model File sigma x Br (in pb)
Higgs_hhxx_scalar_nohdecay_1GeV_13TeV 8.05296e-11
Higgs_hhxx_scalar_nohdecay_10GeV_13TeV 7.9777e-11
Higgs_hhxx_scalar_nohdecay_100GeV_13TeV 1.6817e-16
Higgs_hhxx_scalar_nohdecay_500GeV_13TeV 4.37076e-19
Higgs_hhxx_scalar_nohdecay_1000GeV_13TeV 1.0064e-20
Higgs_Zprime_nohdecay_1GeV_13TeV 0.0000525996
Higgs_Zprime_nohdecay_10GeV_13TeV 0.0000525312
Higgs_Zprime_nohdecay_100GeV_13TeV 0.000001870284
Higgs_Zprime_nohdecay_500GeV_13TeV 0.0000001219116
Higgs_Zprime_nohdecay_1000GeV_13TeV 0.000000000271548
Higgs_scalar_nohdecay_1GeV_13TeV 6.94031999999999982e-03
Higgs_scalar_nohdecay_10GeV_13TeV 6.68723999999999968e-03
Higgs_scalar_nohdecay_100GeV_13TeV 2.07229199999999993e-07
Higgs_scalar_nohdecay_500GeV_13TeV 8.78483999999999983e-10
Higgs_scalar_nohdecay_1000GeV_13TeV 2.23759199999999990e-13

Settings

Information/HowTo

How to Run Delphes on an LHE file using Pythia for showering

I'll try to give a quick tutorial on the necessary steps to do the above chain very conveniently, so that everyone can test the above LHE files. I suppose most people have the relevant tools already but I still try to cover everything. Let me start with the necessary installation steps:

  1. Download and install HepMC from here http://lcgapp.cern.ch/project/simu/HepMC/download/HepMC-2.06.09.tar.gz. In the following I will call the directory /hepmcdir
  2. Download the latest Pythia8 from here http://home.thep.lu.se/~torbjorn/pythia8/pythia8209.tgz. Note that I had problems using any older version (even 8205!). If you do have an older version of Pythia lying around, set your library path to start with this pythia version: 'export LD_LIBRARY_PATH=/pythia8dir/lib:$LD_LIBRARY_PATH';
  3. ./configure --enable-shared --with-hepmc2=/hepmcdir; make; make install. Even if you have Pythia, if you did not compile it with enable-shared options before you cannot run the Pythia-Delphes chain! In that case you have to install Pythia8 anew! In the following I will call the directory /pythiadir
  4. Download and install Delphes from here http://cp3.irmp.ucl.ac.be/downloads/Delphes-3.2.0.tar.gz
  5. Within the Delphes folder, do 'export PYTHIA8=/pythia8dir; make HAS_PYTHIA8=true DelphesPythia8'
  6. In order to run Delphes, do './DelphesPythia8 detectorcard.tcl pythiasettings.in output.root'.
  • For the detector card, you can use the one in delphes/cards/delphes_card_CMS_pileup.tcl until we have decided on the actual settings. If you want to use the above PileUp sample, you have to explicitly put the file path in the “set PileUpFile” line! . Make sure to uncomment any lines in the ROOT Tree writer at the bottom of the file
add Branch TrackMerger/tracks Track Track
add Branch Calorimeter/towers Tower Tower

as this information is used in the analysis (see github link below).

  • For the pythiasettings, there is a working example below (which you can just store in a text file)
Main:numberOfEvents = 1000
Beams:frameType = 4
Beams:LHEF = /.../Higgs_hhxx_scalar_nohdecay_100GeV_13TeV.lhe

To fix the decays of the Higgs, you have to add the following to the above pythia input card:
For H→gamma gamma

25:onMode = off
25:onIfMatch = 22 22

For H → ZZ → 4l

25:onMode    = off
25:onIfMatch = 23 23
23:onMode    = off
23:onIfAny   = 11 13 15
  • If you open a new terminal, you have to do the 'export' step again before you can run the DelphesPythia8 executable
  • If Pythia complains about not finding something that involves xmldoc, you have to - in addition to the above - do export PYTHIA8DATA =$PYTHIA8/xmldoc

Macros for 4l analysis running on delphes samples:

git clone https://github.com/hbprosper/monoHiggs.git

List of plots that can be produced for 4l analysis:

a) Before all the cuts:

- Order the lepton in increasing pT and plot them on the same canvas.
- plots the relative isolation variable of the leptons

b) Then build the Z1 as reported in slide 1 (see References). Plot the Z1 mass . Plots the isolation variable of the leptons not included in the Z1 building. Plots the impact parameter if available.

c) Build the Z2 as pointed in the slide above

d) plots the 4l invariant mass after the full selection and the MET.

e) Plots the same stuff soft signal and background and try to over-impose.

See References for the cuts and a list of Delphes objects and their properties.

Slides of mon-Higgs+DM analysis results in the four lepton and MET final state:

Preliminary results on gamma gamma and 4muons:

Some summary slides for summary talk by Sacha Nikitenko:
https://cernbox.cern.ch/public.php?service=files&t=645c4898c882aa28434f6b7bd93d9576

Tool for statistical analysis by Harrison Prosper:

Daniel: This does not seem to be the standard way ATLAS/CMS set limits, which is based on the frequentist-CLs. Why are we not using that one?
Harrison: It seems reasonable to me to favor statistical methods that are well-founded, by which I mean methods that professional statisticians (rather than amateurs like us) have shown to be sound. Just as people defer to physicists about physics, I think it reasonable to defer to statisticians about statistics. The method implemented in this package is Bayesian, which is well-founded in this sense. By contrast, CLs is not. Moreover, 1) asymptotic CLs may be overly optimistic when observed counts are low; 2) the ATLAS/CMS method requires performing two fits per toy experiment and therefore requires a rule to deal with non-converging or bad fits; 3) the use of profile likelihoods, which underlies the ATLAS/CMS method, when there are multiple nuisance parameters in the problem can yield results that may be too optimistic, and 4) worst of all, CLs (when done correctly) is a huge sink of CPU and manpower for computing numbers which, in the end, simply say that nothing was found! Because of the enormous amount of effort consumed computing CLs during Run I, the CMS SUSY group is revisiting the question of whether limit setting using CLs is sensible for Run II. High energy physicists are prone to a “Masters of the Universe” complex, which is manifested in our dismissal of the wisdom of other professionals. Alas, since I am a high energy physicist ipso facto I am prone to some level of arrogance…but I am not so arrogant as to dismiss two and a half centuries of work on statistics in favor of something like CLs! Surely, Laplace, Fisher, Neyman, Bernardo, Berger and dozens more have useful things to teach us?
Daniel: Since you mention the disadvantages of CLs (which certainly are true!), I have to write a few lines to 'defend' it: Bayesian statistics might be mathematically well founded (more than CLs), but scientifically raises the huge issue that one has to put in a prior degree of belief in a theory via $\pi(\sigma)$. From a theoretical point of view, there is no reason why to assume this to be flat, lognormal, spiky or anything else. I could have a model which describes Nature (= the observation) perfectly for one particular parameter setting but completely fails for any other and if my prior is anything different than a delta-distribution, a Bayesian approach will exclude it due to the marginalisation.
I don't want to raise a discussion about which approach is the best, because neither one is. My only concern is - and that was the main point behind my original question - that for the sake of comparison, one should stick to the statistical methods which have been defined as standard, unless there is a strong reason why the standard approach is not applicable to the given scenario. For the models we have, I don't see any such reason. Moreover, the ATLAS study linked in the References section determines their limit using CLs and we most likely want to compare our result to theirs. If we use different means to quantify our results, I fear it will be hard to disentangle the physics from the statistics.

This standalone tool can be used to compute upper limits on a cross section (or signal strength) given one or more observed counts, $N$. The package can be downloaded from github using

git clone https://github.com/hbprosper/limits.git

To build it, do

cd limits
make

To initialize some environment variables, do

source setup.sh

and to test it do,

cd example
./example1.py 1 1 0 0 0 1

This calculates the 90% upper limit for count $N = 1$. (The other parameters, signal efficiency $\epsilon$, efficiency uncertainty $\delta\epsilon$, background $B$, background uncertainty $\delta B$, and an integrated luminosity ${\cal L}$ are set to 1, 0, 0, 0, and 1, respectively.) The 90% upper limit is the well known value 3.9 for a Poisson distribution with $N = 1$.

Usage The package contains two C++ classes (callable from Python via PyROOT) MultiPoisson and Bayes. The first class sets up a multi-Poisson model, while the second class computes (Bayesian) limits. The Python program blimit.py (which is used in example1.py) shows how to use these two classes. Here are the essential code snippets.

  # load the liblimits shared library using PyROOT
  from ROOT import gSystem
  gSystem.Load('liblimits')
 
  # make the MultiPoisson and Bayes classes visible to Python
  from ROOT import MultiPoisson, Bayes
 
  # initialize multi-Poisson model
  # from an input file.
  model = MultiPoisson('inputs.dat', 1)
 
  # set up Bayes limit calculator
  data     = model.counts()
  sigmamin =  0.0
  sigmamax = 20.0
  CL       =  0.90
  bayes    = Bayes(model, data, sigmamin, sigmamax, CL)
 
  # compute upper limit  
  limit = bayes.quantile()
  print "limit = %10.2f" % limit

Description The limits package computes cross section limits for the multi-Poisson model, $$ p(D \,|\, \sigma, \epsilon, b, {\cal L}) = \prod_{i=1}^K \frac{e^{-n_i} \, n_i^{N_i}}{N_i!}, $$ where $D \equiv N_1,\cdots, N_K$ denotes the set of observed counts and $$n_i = \sigma \, \epsilon_i \, {\cal L} + b_i$$ is the expected count per bin, with $\sigma$, $\epsilon_i$, and $b_i$ the cross section, and the bin-by-bin expected efficiency and expected background, respectively. In order to use the program to compute limits on a signal strength, $\mu$, rather than a cross section, where $n_i = \mu \, s_i + b_i$, one need merely set ${\cal L} = 1$ and make the identifications $ \sigma \rightarrow \mu$, $\epsilon \rightarrow s_i$.

The program marginalizes over the model parameters $\epsilon, b, {\cal L}$, $$ p(D \,|\, \sigma) = \int p(D \,|\, \sigma, \epsilon, b, {\cal L}) \, \pi(\epsilon, b, {\cal L}) \, d\epsilon \, db \, d {\cal L}, $$ with respect to an evidence-based prior density, $\pi(\epsilon, b, {\cal L})$, that encodes whatever is known about these parameters. The function $p(D \,|\, \sigma)$ is called the marginal (or integrated) likelihood. Bayes theorem, $$ p(\sigma \,|\, D ) = p(D \,|\, \sigma) \, \pi(\sigma) \, / \, p(D), $$ is then used to compute the posterior density $p(\sigma \,|\, D)$. The posterior probability, $p(\sigma \,|\, D) \, d\sigma$, is the probability (interpreted as a degree of belief) that the true value of $\sigma$ lies in the interval $[\sigma, \sigma + d\sigma]$. The function $\pi(\sigma)$ is the prior density for the cross section. In principle, it encodes what we know about the cross section independently of the new data $D$. In practice, we usually set $\pi(\sigma) = 1$, though the program allows for more sophisticated choices. The upper limit on the cross section $\sigma_{UP}$ at $100 \times (1-\alpha)$% CL is the solution of the equation, $$\int_0^{\sigma_{UP}} p(\sigma \,|\, D) \, d\sigma = 1 - \alpha,$$ which is calculated by the class Bayes.

The evidence-based prior density $\pi(\epsilon, b, {\cal L})$ can be very complicated. This is because the signal efficiencies and backgrounds typically depend on many underlying experimental and theoretical parameters that are not known precisely, such as the jet energy scale (JES), jet energy resolution (JER), the trigger efficiencies, the object identification efficiencies, numerous data to Monte Carlo correction factors, parton distribution functions (PDF), and artifacts such as the renormalization and factorization scales that enter theoretical calculations. ATLAS and CMS make valiant attempts to model explicitly how all uncertain quantities affect the signal efficiencies and backgrounds (in principle) across multiple signals, multiple backgrounds, multiple signal regions and multiple bins. The explicit modeling, however, of these dependencies quickly leads to enormous complexity.

The limits package takes a different approach. In any given analysis, it is generally possible to identify the independent (or approximately independent) sources of uncertainty. For example, the uncertainty in the jet energy scale is surely independent of the uncertainty in the CT10nlo PDFs. All uncertainties, whatever their provenance and whatever words are used to describe them, be it statistical, systematic, theoretical, best guess, etc., can be accounted for in the same way:

  • Identify all independent sources of systematic uncertainty. Note: if you observe $N = $9 counts in a bin then that number is not uncertain. On the contrary, it is known to be exactly 9! What is uncertain is the expected count in that bin of which 9 is our best estimate and $\pm 3$ is a rough measure of the uncertainty in the expected value. That is why it makes sense to put error bars on bins with $N = 0$ because the bars are about $n$, not $N$.
  • Randomly, and simultaneously, sample from the independent sources of systematic uncertainty.
  • Read in one variation of all the systematic uncertainty parameters.
  • Then, event by event:
    • re-calculate all observables that depend on the systematic uncertainty parameters, and
    • execute the analysis with the updated observables.

Consider the following example. Parton distribution functions are not known precisely and, therefore, are a source of uncertainty. The uncertainty in PDFs, for example, the CT10nlo PDFs, can be encapsulated in an ensemble of PDF sets that can be created using the LHAPDF program hessian2replicas. For each set of CT10nlo PDF sets, one runs the entire analysis chain whose output is either a set of histograms with $K$ bins or, more generally, a flat ntuple from which histograms can be made. Every event is re-weighted by the ratio of the product of the new PDFs to the product of PDFs with which the events were generated. The resulting ensemble of histograms and, therefore, efficiencies and backgrounds incorporates the uncertainty due to our imperfect knowledge of the PDFs. Given this ensemble, the marginal likelihood is approximated by simply averaging the likelihood, $$ p(D \,|\, \sigma) \approx \frac{1}{M} \sum_{j=1}^M p(D \,|\, \sigma, \theta_j), $$ with respect to the parameters $\theta \equiv \epsilon, b$. Each term in the sum is associated with one random variation of the CT10nlo PDF parameters.

The variations of imprecisely known quantities, such as the JES and PDFs, can, and should, be done simultaneously by running the analysis (or analyses) $M$ times, in parallel, each with a different random variation of the imprecisely known quantities. Varying them one by one in order to quantify the relative importance of each systematic effect is instructive. However, adding the resulting uncertainties in quadrature may not be adequate.

The key to this approach is to think carefully about how each randomly varied parameter affects the observables in the analysis. How does the jet energy scale and jet energy resolution affect jet observables and therefore the missing transverse energy? How do the data to Monte Carlo corrections affect the observables and the event weight? How does the trigger efficiency affect the event weight? How do the PDFs affect the event weight, and so on?

Final remarks Ideally, every analysis chain comprises two independent modules: event by event, the first module takes as input the nominal (experiment or framework-dependent) objects and a particular point in the space of systematic parameters and outputs the re-calibrated framework-independent objects, ideally, as vectors of TParticle objects, while the second module takes as input the re-calibrated TParticle objects and outputs histograms and, or, flat ntuples. The second module should have no knowledge of the provenance of the TParticle objects. If we wrote our analysis codes this way, they would become framework-independent reusable modules. But, I've just realized I have been dreaming!

Proceedings

We started to work on the proceeding. The deadline is end of February. You can find the last version at link:

proceeding.zip

2015/groups/higgs/dmhiggs/monoh.1453411308.txt.gz · Last modified: 2016/01/21 22:21 by nicola.de_filippis