====== Event Format : extended LHE ====== People involved: Eric & Benj ===== Present state ===== {{ :2013:groups:tools:mc_formats.png?400 |}} Sample size: example of ttbar dileptonic @ LHC (10,000 events) | ^ File size (Mo) ^ ^ LHE (gzip compression) | 3.8 | ^ STDHEP (gzip compression) | 153 | ^ HEPMC (gzip compression) | 346 | ^ simplified LHE (gzip compression) | 5.1 | ^ LHCO (gzip compression) | 1.6 | ^ Delphes 2 ROOT | 161 | ===== Motivations ===== * defining a format for jet-clustering output (without fast-simulation detector). * defining a format which extends the LHCO content (too few information for performing sophisticated analysis). * the new format will take into account all Delphes 3 potential. ===== Some ideas to discuss ===== * Using text format. ROOT is rejected ; STDHEP seems to be old. * Prefering extending an existing format to defining a totally new one (avoiding developers from coding writer and reader functions from scratch). Our choice is to extend the LHE Format (arXiv:hep-ph/0609017) and its structure based on XML tags. Reminder about the LHE structure:
...
2212 2212 0.40000000000E+04 0.40000000000E+04 0 0 10042 10042 3 1 0.47468358499E+01 0.15068796356E-01 0.47469000000E-03 0 ... 12 0 0.4746900E-03 0.2312331E+03 0.7957747E-01 0.1132798E+00 21 -1 0 0 501 502 0.00000000000E+00 0.00000000000E+00 0.74064204368E+02 0.74064204368E+02 0.00000000000E+00 0. 1. 21 -1 0 0 502 503 0.00000000000E+00 0.00000000000E+00 -0.74552086368E+03 0.74552086368E+03 0.00000000000E+00 0. 1. -6 2 1 2 0 503 0.14952840473E+03 -0.23999735524E+02 -0.41424800778E+03 0.47441561784E+03 0.17473990778E+03 0. 0. -24 2 3 3 0 0 0.56722398399E+02 -0.36860071438E+02 -0.33540004381E+03 0.35186997544E+03 0.82116958530E+02 0. 0. 6 2 1 2 501 0 -0.14952840473E+03 0.23999735524E+02 -0.25720865153E+03 0.34516945021E+03 0.17335203433E+03 0. 0. 24 2 5 5 0 0 -0.16699616992E+03 0.38357854935E+02 -0.25987491067E+03 0.32192128147E+03 0.82093218139E+02 0. 0. -13 1 6 6 0 0 -0.76026472087E+02 0.53922169130E+02 -0.95737952146E+02 0.13361654188E+03 0.00000000000E+00 0. 1. 14 1 6 6 0 0 -0.90969697833E+02 -0.15564314195E+02 -0.16413695853E+03 0.18830473960E+03 0.00000000000E+00 0. -1. 5 1 5 5 501 0 0.17467765185E+02 -0.14358119410E+02 0.26662591415E+01 0.23248168736E+02 0.46999998093E+01 0. -1. 11 1 4 4 0 0 0.50813684997E+02 -0.61274565657E+02 -0.22556131392E+03 0.23919554619E+03 0.00000000000E+00 0. -1. -12 1 4 4 0 0 0.59087134026E+01 0.24414494219E+02 -0.10983872989E+03 0.11267442925E+03 0.00000000000E+00 0. 1. -5 1 3 3 0 503 0.92806006335E+02 0.12860335914E+02 -0.78847963972E+02 0.12254564240E+03 0.46999998093E+01 0. 1. ...
* Extending the LHE format in order to store partons, hadrons and jets (reco objects) in a same file. The generation step (hard-process, shower, reco) will be specified by the status-code. Some details: ---- * parton level: same conventions than the existing LHE. Example: 6 2 1 2 501 0 -0.14952840473E+03 0.23999735524E+02 -0.25720865153E+03 0.34516945021E+03 0.17335203433E+03 0. 0. * hadron level: the conventions can be applied without too much change (maybe the meaning of the two ICOLUP variables could be discussed). 2212 2 1 2 501 0 -0.14952840473E+03 0.23999735524E+02 -0.25720865153E+03 0.34516945021E+03 0.17335203433E+03 0. 0. * reco level: the conventions have to be adapted. We can keep: * one line per physics object. * ISTUP variable with specific PDG-id for reco objects. Example: 11/-11 for electrons, 13/-13 for muons, 15/-15 for hadronically-decaying taus, 22 for photons, 21 for jets, 12 for MET, -12 for MHT * MOTHUP variables linked the reconstructed object to the originated partons (only for some objects). Example: a electron coming from hard-process. * PUP variables without change. Other relevant variables, specific to the nature of the reconstructed objects, must be added. ---- * Optional substructure in block: defining a XML tag for each collection of reconstructed objects. When detector fast-simulation is applied, several configurations can be applied (for instance for lepton isolation) and several collection of the same object kind can be produced. The -block substructure is designed to handle several collections of a same object kind. Example: ... ... ... ... ... Warning: If the LHE file supplies several collections of jets, people must obviously use only collection of jets in their analysis.