User Tools

Site Tools


Sidebar

2019:groups:tools:correlations

Study: correlations between signal regions

Members: Sophie, Wolfgang, Humberto, Benj, Andy, Sabine, Sezen

Problem statement: to identify pairs of signal regions of the analyses in the PAD database that can safely be treated as approximately uncorrelated.

Solution: We aim at a probabilistic approach. We produce events to populate all signal regions, then analyse the extent to which SRs are statistically correlated, i.e. share events.

The events used for the filling should be from (the sum of several) signal models known to provide good SR population. We could use the analyses' benchmark signal MC samples, but a cooler idea is to use SModelS to map topologies (=SRs) back to model points.

For the stats, we'll use bootstrapping rather than directly study which events fall into common signal regions. Procedure: in the analysis framework the set of populated SRs is reported for each event, e.g. as a line of N_SR 1s and 0s in an output file. We then process this: for each event (=line) we sample N_history = O(100-1000) Poisson(lambda=1) weights, and enter these into a set of N_history histograms (each histogram has N_SR bins). We then build a correlation matrix between the bins using the standard sample covariance cov_ij = <sumw_i sumw_j> - <sumw_i> <sumw_j> and corr_ij = cov_ij / sqrt( cov_ii cov_jj ). Finally, convert this to a binary “sufficiently independent SRs” matrix with a corr threshold: indep_ij = (|corr_ij| < thres). The 1s in each row (or column) of this binary matrix define a set of statistically independent SRs, which can be trivially combined in a likelihood.

MA5 package to be used for correlation studies

- Any version of the code from v1.8.20 onwards, to be downloaded e.g. from here .

- This version contains two new dedicated functions (to be added in tools/PAD/Build/Main/main.cpp):

  • void manager.DumpSR(std::ostream&): writes a series of 0 and 1. One entry for each considered signal region. One line per event. To be included in the event loop.
  • void manager.HeadSR(std::ostream&): writes the header of the file: one comment line (starting with a hash) with the list of analysis-SR. To be included before entering the event loop/

Generic overview of what is implemented in which recast framework

MA5 PAD - SModelS correspondence (13 TeV):

nb mono-X analyses generally cannot be treated in SModelS

Analysis ID Short description SMS in SModelS Result Type Comment
ATLAS-SUSY-2015-06 multijet + MET, 3.2 fb-1 T1, T2 EM superseded by 36 fb-1 analysis
ATLAS-SUSY-2016-07 multijet + MET, 36 fb-1 T1, T2, T5ZZ, T5WW(off) EM in SModelS develop
CMS-SUS-16-033 multijet + MET, 36 fb-1 T1, T1bbbb, T1tttt(off), T2, T2bb, T2tt(off) UL
CMS-SUS-16-039 multilepton EWK TChiWZ(off), TChiWH, TChipmSlep… UL
CMS-PAS-SUS-16-052 1L stop, soft T2bbWWoff, T6bbWWoff UL, agg-EM
CMS-SUS-17-001 2L stop T2tt(off), T6bbWW UL

See also http://madanalysis.irmp.ucl.ac.be/wiki/PublicAnalysisDatabase

ColliderBit - SModelS correspondence (13 TeV):

Analysis ID Short description SMS in SModelS Result Type Comment
ATLAS-SUSY-2016-07 multijet + MET T1, T2, T5ZZ, T5WW(off) EM in SModelS develop
ATLAS-SUSY-2016-15 0L stop should include this in SModelS
ATLAS-SUSY-2016-16 1L stop should include this in SModelS
ATLAS-SUSY-2016-17 2L stop T2tt(off), T2bbWWoff UL should include this in SModelS
ATLAS-SUSY-2016-28 b-jets + MET should include this in SModelS
ATLAS-SUSY-2016-24 multilepton EWK (UL: TChiWZ, TSlepSlep, …) (EM: TChiWZ) should include this in SModelS (currently in Philipp's branch)
CMS-SUS-16-033 multijet + MET T1, T1bbbb, T1tttt(off), T2, T2bb, T2tt(off) UL
CMS-SUS-16-043 1L 2b + MET (EWKino) TChiWH UL only PAS version in Colliderbit
CMS-SUS-16-051 1L stop T2tt(off), T6bbWW UL
CMS-SUS-17-001 2L stop T2tt(off), T6bbWW UL
CMS-SUS-16-034 SFOS lept. + jets T5ZZ, TChiWZ UL only on-Z SRs in ColliderBit
CMS-SUS-16-039 multilepton EWK TChiWZ(off), TChiWH, TChipmSlep… UL

Quantifying overlaps between analysis search regions using ADLs

Members: Sezen, Wolfgang (, Harrison)

Find and visualize overlaps in a model-independent way, without generating events using simple descriptions done using an analysis description language. Directly sample the event selection. Useful for analysis design phase, or quick comparisons within experiments (e.g. Run2 CMS SUSY pMSSM combination)

  • Start from the analysis description, which lists objects and event selections.
  • Construct a feature space from all mathematically orthogonal “basic” variables (e.g. MET, jet1.pt, jet2.pt, electron1.eta, …).
  • Randomly sample the feature space for each analysis based on cuts on the feature space components (jet1.pt > 100, MET > 299, etc.).
  • Use the sampled points to compute values for “composite” variables such as HT(jets), dphi(jets), MT(lepton, MET), etc.
  • Compare feature spaces between analyses, find and visualize overlaps and exclusions.
  • As a very simple first step, we simply check if two analyses are disjoint in any of the basic variables.
2019/groups/tools/correlations.txt · Last modified: 2019/06/27 17:10 by sezen.sekmen