User Tools

Site Tools


Sidebar

2013:groups:tools_lheextension

This is an old revision of the document!


Extending the Les Houches Event file format

The aim

After discussions with both ATLAS, CMS, aMC@NLO and POWHEG people, it has been found out that it is valuable to include in the LHE format some extra information about event reweighting. And thus comes the problematics of standardizing the way to pass this information. This would allow to have, in one single LHE file, unweighted events corresponding to a given MC setup (for instance, central scales, central PDF, etc…), together with weights related to the variation of one or several of the MC parameters.

The proposal below starts from the first try in this direction in the LH proceedings of 2009 (arXiv:1003.1643), and include

  • more information in the header,
  • a reorganization of the weights, collecting them into categories.

Any comment, suggestion or criticism is welcome. Please update the text. Points to be discussed can be found below the proposal.

The proposal

First part: the header of the event file

The header contains the explanation about what these weights:

<header>
 ...

 <initrwgt>
  <weight id='1'> This is the original event weight </weight>
  <weightgroup type='scale_variation' combine='envelope'>
     <weight id='2'> muR=2.0 </weight>
     <weight id='3'> muR=0.5 </weight>
  </weightgroup>
   <weightgroup type="mrst2008e40" combine="hessian">
     <weight id='4'> set01 </weight>
     <weight id='5'> set02 </weight> 
     ...
   </weightgroup>
  <weightgroup type='Qmatch_variation' combine='envelope'>
     <weight id='44'> Qmatch=20 </weight>
     <weight id='45'> Qmatch=40 </weight>
  </weightgroup>
  <weight id='46'> BSM benchmark point number 42B, see arXiv XXXX.XXXX </weight>
 </initrwgt>
...
</header>

This information in the header should be human-readable and explain what the weights with the corresponding identifiers mean. It can simply contain all the parameters that were used in to generate this weight; or only the ones that were changed compared to the original run; or simply a sentence explaining what this number means. It's up to the user that is doing the analysis to make sure that this information is correctly used (and up to the authors of the codes to make sure that the user has enough information to understand what the weights correspond to).

The weightgroup tag allows to group several weights together (to have the information about how to combine weights to obtain, e.g., scale variation or pdf uncertainties. The attributes combine is optional. It indicates how to combine the uncertainties. Possible arguments are none, hessian, envelope or gaussian. If not specified, the default choice is combine='none', all the curves associated with each weights being kept independent. For combine='hessian', the first weight is the central value and the next weights correspond respectively to the positive and negative variations along a specific direction of the parameter space. This is not very XML-friendly, so any suggestion here is very welcome.

Second part: within each event

We start with an example:

<event id='evtid'>
7 100  0.10000000E+01  0.20000000E+00  0.00000000E+00  0.00000000E+00
 -2 -1  0  0 0 0  0.12699952E+01  0.55429630E+01  0.57634577E+02  0.57914435E+02  0.00000000E+00 0. 0.
  2 -1  0  0 0 0 -0.91353745E+00  0.13160013E+01 -0.34965448E+02  0.35002128E+02  0.00000000E+00 0. 0.
 23  2  1  1 0 0  0.35645919E+00  0.68589662E+01  0.22669189E+02  0.92916566E+02  0.89846668E+02 0. 0.
-13  2  3  3 0 0  0.51612833E+01  0.21143065E+02  0.53960893E+02  0.58184682E+02  0.10566000E+00 0. 0.
 13  2  3  3 0 0 -0.48048241E+01 -0.14284099E+02 -0.31291705E+02  0.34731884E+02  0.10566000E+00 0. 0.
-13  1  0  0 0 0  0.51612833E+01  0.21143065E+02  0.53960893E+02  0.58184682E+02  0.10566000E+00 0. 0.
 13  1  0  0 0 0 -0.48048241E+01 -0.14284099E+02 -0.31291705E+02  0.34731884E+02  0.10566000E+00 0. 0.
 <rwgt>
  <wgt id='1'> 1.001e+00 </wgt>
  <wgt id='2'> 0.204e+00 </wgt>
  <wgt id='3'> 1.564e+00 </wgt>
  <wgt id='4'> 2.248e+00 </wgt>
  <wgt id='5'> 1.486e+00 </wgt>
  ...
  <wgt id='46'> -0.899e+00 </wgt>
 </rwgt>
</event>

The numbers should be normalized in the same way as the original weight of the event, i.e. if the weights sum-up to the total cross section, also the new <nw> weights should sum up to the total cross section (which is in general slightly different because different parameters were used). On the other hand, if the original weights are normalized to 1 (like in the event above), i.e. the number of events generated correspond directly to a given luminosity, the same normalization should be used for the <nw> weights. In other words, if you want to know the fractional variation of the <nw> weight corresponding to the original weight (after unweighting), it's always enough to divide the new weight by the original weight. The ordering here is irrelevant.

The event id is important as soon as we have to deal event file after showering/hadronization (stdHEP or HEPMC event files). This would allow to pass the reweighing information under the price of having multiple file to read. This also avoids to have to extend the standard stdHEP and HEPMC formats.

To be discussed

  • The <clustering> tag of 1003.1643
2013/groups/tools_lheextension.1360055970.txt.gz · Last modified: 2013/02/05 10:19 by benjamin.fuks