User Tools

Site Tools


2019:groups:tools:correlations

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
2019:groups:tools:correlations [2019/06/25 13:41]
sabine.kraml
2019:groups:tools:correlations [2019/06/27 16:30]
wolfgang.waltenberger [Quantifying overlaps between analysis search regions using ADLs]
Line 1: Line 1:
  ​====== Study: correlations between signal regions ======  ​====== Study: correlations between signal regions ======
  
-//Members: Sophie, Wolfgang, Humberto, Benj, Andy, Sabine //+//​Members: ​**Sophie**, Wolfgang, Humberto, Benj, Andy, Sabine, Sezen //
  
 Problem statement: to identify pairs of signal regions of the analyses in the PAD database that can safely be treated as approximately uncorrelated. Problem statement: to identify pairs of signal regions of the analyses in the PAD database that can safely be treated as approximately uncorrelated.
Line 10: Line 10:
  
 For the stats, we'll use bootstrapping rather than directly study which events fall into common signal regions. Procedure: in the analysis framework the set of populated SRs is reported for // each event//, e.g. as a line of N_SR 1s and 0s in an output file. We then process this: for each event (=line) we sample N_history = O(100-1000) Poisson(lambda=1) weights, and enter these into a set of N_history histograms (each histogram has N_SR bins). We then build a correlation matrix between the bins using the standard sample covariance cov_ij = <sumw_i sumw_j> - <​sumw_i>​ <​sumw_j> ​ and corr_ij = cov_ij / sqrt( cov_ii cov_jj ). Finally, convert this to a binary "​sufficiently independent SRs" matrix with a corr threshold: indep_ij = (|corr_ij| < thres). The 1s in each row (or column) of this binary matrix define a set of statistically independent SRs, which can be trivially combined in a likelihood. For the stats, we'll use bootstrapping rather than directly study which events fall into common signal regions. Procedure: in the analysis framework the set of populated SRs is reported for // each event//, e.g. as a line of N_SR 1s and 0s in an output file. We then process this: for each event (=line) we sample N_history = O(100-1000) Poisson(lambda=1) weights, and enter these into a set of N_history histograms (each histogram has N_SR bins). We then build a correlation matrix between the bins using the standard sample covariance cov_ij = <sumw_i sumw_j> - <​sumw_i>​ <​sumw_j> ​ and corr_ij = cov_ij / sqrt( cov_ii cov_jj ). Finally, convert this to a binary "​sufficiently independent SRs" matrix with a corr threshold: indep_ij = (|corr_ij| < thres). The 1s in each row (or column) of this binary matrix define a set of statistically independent SRs, which can be trivially combined in a likelihood.
 +
 +
 +=== MA5 package to be used for correlation studies ===
 +
 +- Any version of the code from v1.8.20 onwards, to be downloaded e.g. from {{ 2019:​groups:​tools:​ma5_v1.8.20.tgz | here }}.
 +
 +- This version contains two new dedicated functions (to be added in tools/​PAD/​Build/​Main/​main.cpp):​
 +   * void manager.DumpSR(std::​ostream&​):​ writes a series of 0 and 1. One entry for each considered signal region. One line per event. To be included in the event loop.
 +   * void manager.HeadSR(std::​ostream&​):​ writes the header of the file: one comment line (starting with a hash) with the list of analysis-SR. To be included before entering the event loop/
 +
 +
 +**Generic overview of [[RecastCodeComparison|what is implemented in which recast framework]]**
 +
  
  
Line 21: Line 34:
 | CMS-SUS-16-033 ​    | multijet + MET, 36 fb-1   | T1, T1bbbb, T1tttt(off),​ T2, T2bb, T2tt(off) | UL |  | CMS-SUS-16-033 ​    | multijet + MET, 36 fb-1   | T1, T1bbbb, T1tttt(off),​ T2, T2bb, T2tt(off) | UL | 
 | CMS-SUS-16-039 ​    | multilepton EWK  | TChiWZ(off),​ TChiWH, TChipmSlep... | UL |  | CMS-SUS-16-039 ​    | multilepton EWK  | TChiWZ(off),​ TChiWH, TChipmSlep... | UL | 
-| CMS-SUS-16-052 ​    | 1L stop, soft    | T2bbWWoff, T6bbWWoff | UL, agg-EM | SModelS has only PAS version of this |+| CMS-PAS-SUS-16-052 ​    | 1L stop, soft    | T2bbWWoff, T6bbWWoff | UL, agg-EM |  |
 | CMS-SUS-17-001 ​    | 2L stop          | T2tt(off), T6bbWW | UL  |  | CMS-SUS-17-001 ​    | 2L stop          | T2tt(off), T6bbWW | UL  | 
 +
 +See also http://​madanalysis.irmp.ucl.ac.be/​wiki/​PublicAnalysisDatabase
  
  
Line 42: Line 57:
 | CMS-SUS-16-039 ​    | multilepton EWK  | TChiWZ(off),​ TChiWH, TChipmSlep... | UL |  | CMS-SUS-16-039 ​    | multilepton EWK  | TChiWZ(off),​ TChiWH, TChipmSlep... | UL | 
  
 +==== Quantifying overlaps between analysis search regions using ADLs ====
 +
 +Members: Sezen, Wolfgang (, Harrison)
  
-=== CheckMate ​SModelS correspondence ​(13 TeV): ===+Find and visualize overlaps in a model-independent way, without generating events. ​ Directly sample the event selection. ​ Useful for analysis design phase, or quick comparisons within experiments ​(e.g. Run2 CMS SUSY pMSSM combination)
  
-//... to be done ...//+  * Start from the analysis description,​ which lists objects and event selections  
 +  * Construct a feature space from all mathematically orthogonal "​basic"​ variables (e.gMET, jet1.pt, jet2.pt, electron1.eta,​ ...).   
 +  * Randomly sample the feature space for each analysis based on cuts on the feature space components (jet1.pt > 100, MET > 299, etc.). 
 +  * Use the sampled points ​to compute values for "​composite"​ variables such as HT(jets), dphi(jets), MT(lepton, MET), etc. 
 +  * Compare feature spaces between analyses, find and visualize overlaps and exclusions. 
 +  * As a very simple first step, we simply check if two analyses are disjoint in any of the basic variables.
  
  
2019/groups/tools/correlations.txt · Last modified: 2019/06/27 17:10 by sezen.sekmen