====== Study: correlations between signal regions ====== //Members: **Sophie**, Wolfgang, Humberto, Benj, Andy, Sabine, Sezen // Problem statement: to identify pairs of signal regions of the analyses in the PAD database that can safely be treated as approximately uncorrelated. Solution: We aim at a probabilistic approach. We produce events to populate all signal regions, then analyse the extent to which SRs are statistically correlated, i.e. share events. The events used for the filling should be from (the sum of several) signal models known to provide good SR population. We could use the analyses' benchmark signal MC samples, but a cooler idea is to use SModelS to map topologies (=SRs) back to model points. For the stats, we'll use bootstrapping rather than directly study which events fall into common signal regions. Procedure: in the analysis framework the set of populated SRs is reported for // each event//, e.g. as a line of N_SR 1s and 0s in an output file. We then process this: for each event (=line) we sample N_history = O(100-1000) Poisson(lambda=1) weights, and enter these into a set of N_history histograms (each histogram has N_SR bins). We then build a correlation matrix between the bins using the standard sample covariance cov_ij = - and corr_ij = cov_ij / sqrt( cov_ii cov_jj ). Finally, convert this to a binary "sufficiently independent SRs" matrix with a corr threshold: indep_ij = (|corr_ij| < thres). The 1s in each row (or column) of this binary matrix define a set of statistically independent SRs, which can be trivially combined in a likelihood. === MA5 package to be used for correlation studies === - Any version of the code from v1.8.20 onwards, to be downloaded e.g. from {{ 2019:groups:tools:ma5_v1.8.20.tgz | here }}. - This version contains two new dedicated functions (to be added in tools/PAD/Build/Main/main.cpp): * void manager.DumpSR(std::ostream&): writes a series of 0 and 1. One entry for each considered signal region. One line per event. To be included in the event loop. * void manager.HeadSR(std::ostream&): writes the header of the file: one comment line (starting with a hash) with the list of analysis-SR. To be included before entering the event loop/ **Generic overview of [[RecastCodeComparison|what is implemented in which recast framework]]** === MA5 PAD - SModelS correspondence (13 TeV): === //nb mono-X analyses generally cannot be treated in SModelS // ^ Analysis ID ^ Short description ^ SMS in SModelS ^ Result Type ^ Comment ^ | ATLAS-SUSY-2015-06 | multijet + MET, 3.2 fb-1 | T1, T2 | EM | superseded by 36 fb-1 analysis | | ATLAS-SUSY-2016-07 | multijet + MET, 36 fb-1 | T1, T2, T5ZZ, T5WW(off) | EM | in SModelS develop | | CMS-SUS-16-033 | multijet + MET, 36 fb-1 | T1, T1bbbb, T1tttt(off), T2, T2bb, T2tt(off) | UL | | CMS-SUS-16-039 | multilepton EWK | TChiWZ(off), TChiWH, TChipmSlep... | UL | | CMS-PAS-SUS-16-052 | 1L stop, soft | T2bbWWoff, T6bbWWoff | UL, agg-EM | | | CMS-SUS-17-001 | 2L stop | T2tt(off), T6bbWW | UL | See also http://madanalysis.irmp.ucl.ac.be/wiki/PublicAnalysisDatabase === ColliderBit - SModelS correspondence (13 TeV): === ^ Analysis ID ^ Short description ^ SMS in SModelS ^ Result Type ^ Comment ^ | ATLAS-SUSY-2016-07 | multijet + MET | T1, T2, T5ZZ, T5WW(off) | EM | in SModelS develop | | ATLAS-SUSY-2016-15 | 0L stop | -- | -- | should include this in SModelS | | ATLAS-SUSY-2016-16 | 1L stop | -- | -- | should include this in SModelS | | ATLAS-SUSY-2016-17 | 2L stop | T2tt(off), T2bbWWoff | UL | should include this in SModelS | | ATLAS-SUSY-2016-28 | b-jets + MET | -- | -- | should include this in SModelS | | ATLAS-SUSY-2016-24 | multilepton EWK | (UL: TChiWZ, TSlepSlep, ...) (EM: TChiWZ) | -- | should include this in SModelS (currently in Philipp's branch) | | CMS-SUS-16-033 | multijet + MET | T1, T1bbbb, T1tttt(off), T2, T2bb, T2tt(off) | UL | | CMS-SUS-16-043 | 1L 2b + MET (EWKino) | TChiWH | UL | only PAS version in Colliderbit | | CMS-SUS-16-051 | 1L stop | T2tt(off), T6bbWW | UL | | CMS-SUS-17-001 | 2L stop | T2tt(off), T6bbWW | UL | | CMS-SUS-16-034 | SFOS lept. + jets | T5ZZ, TChiWZ | UL | only on-Z SRs in ColliderBit | | CMS-SUS-16-039 | multilepton EWK | TChiWZ(off), TChiWH, TChipmSlep... | UL | ==== Quantifying overlaps between analysis search regions using ADLs ==== Members: Sezen, Wolfgang (, Harrison) Find and visualize overlaps in a model-independent way, without generating events using simple descriptions done using an [[[[2019:groups:tools:adl|analysis description language]]. Directly sample the event selection. Useful for analysis design phase, or quick comparisons within experiments (e.g. Run2 CMS SUSY pMSSM combination) * Start from the analysis description, which lists objects and event selections. * Construct a feature space from all mathematically orthogonal "basic" variables (e.g. MET, jet1.pt, jet2.pt, electron1.eta, ...). * Randomly sample the feature space for each analysis based on cuts on the feature space components (jet1.pt > 100, MET > 299, etc.). * Use the sampled points to compute values for "composite" variables such as HT(jets), dphi(jets), MT(lepton, MET), etc. * Compare feature spaces between analyses, find and visualize overlaps and exclusions. * As a very simple first step, we simply check if two analyses are disjoint in any of the basic variables.