User Tools

Site Tools


2019:groups:tools:correlations

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
2019:groups:tools:correlations [2019/06/25 13:41]
sabine.kraml
2019:groups:tools:correlations [2019/06/27 17:10] (current)
sezen.sekmen [Quantifying overlaps between analysis search regions using ADLs]
Line 1: Line 1:
  ​====== Study: correlations between signal regions ======  ​====== Study: correlations between signal regions ======
  
-//Members: Sophie, Wolfgang, Humberto, Benj, Andy, Sabine //+//​Members: ​**Sophie**, Wolfgang, Humberto, Benj, Andy, Sabine, Sezen //
  
 Problem statement: to identify pairs of signal regions of the analyses in the PAD database that can safely be treated as approximately uncorrelated. Problem statement: to identify pairs of signal regions of the analyses in the PAD database that can safely be treated as approximately uncorrelated.
Line 10: Line 10:
  
 For the stats, we'll use bootstrapping rather than directly study which events fall into common signal regions. Procedure: in the analysis framework the set of populated SRs is reported for // each event//, e.g. as a line of N_SR 1s and 0s in an output file. We then process this: for each event (=line) we sample N_history = O(100-1000) Poisson(lambda=1) weights, and enter these into a set of N_history histograms (each histogram has N_SR bins). We then build a correlation matrix between the bins using the standard sample covariance cov_ij = <sumw_i sumw_j> - <​sumw_i>​ <​sumw_j> ​ and corr_ij = cov_ij / sqrt( cov_ii cov_jj ). Finally, convert this to a binary "​sufficiently independent SRs" matrix with a corr threshold: indep_ij = (|corr_ij| < thres). The 1s in each row (or column) of this binary matrix define a set of statistically independent SRs, which can be trivially combined in a likelihood. For the stats, we'll use bootstrapping rather than directly study which events fall into common signal regions. Procedure: in the analysis framework the set of populated SRs is reported for // each event//, e.g. as a line of N_SR 1s and 0s in an output file. We then process this: for each event (=line) we sample N_history = O(100-1000) Poisson(lambda=1) weights, and enter these into a set of N_history histograms (each histogram has N_SR bins). We then build a correlation matrix between the bins using the standard sample covariance cov_ij = <sumw_i sumw_j> - <​sumw_i>​ <​sumw_j> ​ and corr_ij = cov_ij / sqrt( cov_ii cov_jj ). Finally, convert this to a binary "​sufficiently independent SRs" matrix with a corr threshold: indep_ij = (|corr_ij| < thres). The 1s in each row (or column) of this binary matrix define a set of statistically independent SRs, which can be trivially combined in a likelihood.
 +
 +
 +=== MA5 package to be used for correlation studies ===
 +
 +- Any version of the code from v1.8.20 onwards, to be downloaded e.g. from {{ 2019:​groups:​tools:​ma5_v1.8.20.tgz | here }}.
 +
 +- This version contains two new dedicated functions (to be added in tools/​PAD/​Build/​Main/​main.cpp):​
 +   * void manager.DumpSR(std::​ostream&​):​ writes a series of 0 and 1. One entry for each considered signal region. One line per event. To be included in the event loop.
 +   * void manager.HeadSR(std::​ostream&​):​ writes the header of the file: one comment line (starting with a hash) with the list of analysis-SR. To be included before entering the event loop/
 +
 +
 +**Generic overview of [[RecastCodeComparison|what is implemented in which recast framework]]**
 +
  
  
Line 21: Line 34:
 | CMS-SUS-16-033 ​    | multijet + MET, 36 fb-1   | T1, T1bbbb, T1tttt(off),​ T2, T2bb, T2tt(off) | UL |  | CMS-SUS-16-033 ​    | multijet + MET, 36 fb-1   | T1, T1bbbb, T1tttt(off),​ T2, T2bb, T2tt(off) | UL | 
 | CMS-SUS-16-039 ​    | multilepton EWK  | TChiWZ(off),​ TChiWH, TChipmSlep... | UL |  | CMS-SUS-16-039 ​    | multilepton EWK  | TChiWZ(off),​ TChiWH, TChipmSlep... | UL | 
-| CMS-SUS-16-052 ​    | 1L stop, soft    | T2bbWWoff, T6bbWWoff | UL, agg-EM | SModelS has only PAS version of this |+| CMS-PAS-SUS-16-052 ​    | 1L stop, soft    | T2bbWWoff, T6bbWWoff | UL, agg-EM |  |
 | CMS-SUS-17-001 ​    | 2L stop          | T2tt(off), T6bbWW | UL  |  | CMS-SUS-17-001 ​    | 2L stop          | T2tt(off), T6bbWW | UL  | 
 +
 +See also http://​madanalysis.irmp.ucl.ac.be/​wiki/​PublicAnalysisDatabase
  
  
Line 42: Line 57:
 | CMS-SUS-16-039 ​    | multilepton EWK  | TChiWZ(off),​ TChiWH, TChipmSlep... | UL |  | CMS-SUS-16-039 ​    | multilepton EWK  | TChiWZ(off),​ TChiWH, TChipmSlep... | UL | 
  
 +==== Quantifying overlaps between analysis search regions using ADLs ====
 +
 +Members: Sezen, Wolfgang (, Harrison)
  
-=== CheckMate ​SModelS correspondence ​(13 TeV): ===+Find and visualize overlaps in a model-independent way, without generating events using simple descriptions done using an [[[[2019:​groups:​tools:​adl|analysis description language]]. ​ Directly sample the event selection. ​ Useful for analysis design phase, or quick comparisons within experiments ​(e.g. Run2 CMS SUSY pMSSM combination)
  
-//... to be done ...//+  * Start from the analysis description,​ which lists objects and event selections  
 +  * Construct a feature space from all mathematically orthogonal "​basic"​ variables (e.gMET, jet1.pt, jet2.pt, electron1.eta,​ ...).   
 +  * Randomly sample the feature space for each analysis based on cuts on the feature space components (jet1.pt > 100, MET > 299, etc.). 
 +  * Use the sampled points ​to compute values for "​composite"​ variables such as HT(jets), dphi(jets), MT(lepton, MET), etc. 
 +  * Compare feature spaces between analyses, find and visualize overlaps and exclusions. 
 +  * As a very simple first step, we simply check if two analyses are disjoint in any of the basic variables.
  
  
2019/groups/tools/correlations.1561462888.txt.gz · Last modified: 2019/06/25 13:41 by sabine.kraml