====== Event Format : extended LHE ======
People involved: Eric & Benj
===== Present state =====
{{ :2013:groups:tools:mc_formats.png?400 |}}
Sample size: example of ttbar dileptonic @ LHC (10,000 events)
| ^ File size (Mo) ^
^ LHE (gzip compression) | 3.8 |
^ STDHEP (gzip compression) | 153 |
^ HEPMC (gzip compression) | 346 |
^ simplified LHE (gzip compression) | 5.1 |
^ LHCO (gzip compression) | 1.6 |
^ Delphes 2 ROOT | 161 |
===== Motivations =====
* defining a format for jet-clustering output (without fast-simulation detector).
* defining a format which extends the LHCO content (too few information for performing sophisticated analysis).
* the new format will take into account all Delphes 3 potential.
===== Some ideas to discuss =====
* Using text format. ROOT is rejected ; STDHEP seems to be old.
* Prefering extending an existing format to defining a totally new one (avoiding developers from coding writer and reader functions from scratch). Our choice is to extend the LHE Format (arXiv:hep-ph/0609017) and its structure based on XML tags. Reminder about the LHE structure:
2212 2212 0.40000000000E+04 0.40000000000E+04 0 0 10042 10042 3 1
0.47468358499E+01 0.15068796356E-01 0.47469000000E-03 0
...
12 0 0.4746900E-03 0.2312331E+03 0.7957747E-01 0.1132798E+00
21 -1 0 0 501 502 0.00000000000E+00 0.00000000000E+00 0.74064204368E+02 0.74064204368E+02 0.00000000000E+00 0. 1.
21 -1 0 0 502 503 0.00000000000E+00 0.00000000000E+00 -0.74552086368E+03 0.74552086368E+03 0.00000000000E+00 0. 1.
-6 2 1 2 0 503 0.14952840473E+03 -0.23999735524E+02 -0.41424800778E+03 0.47441561784E+03 0.17473990778E+03 0. 0.
-24 2 3 3 0 0 0.56722398399E+02 -0.36860071438E+02 -0.33540004381E+03 0.35186997544E+03 0.82116958530E+02 0. 0.
6 2 1 2 501 0 -0.14952840473E+03 0.23999735524E+02 -0.25720865153E+03 0.34516945021E+03 0.17335203433E+03 0. 0.
24 2 5 5 0 0 -0.16699616992E+03 0.38357854935E+02 -0.25987491067E+03 0.32192128147E+03 0.82093218139E+02 0. 0.
-13 1 6 6 0 0 -0.76026472087E+02 0.53922169130E+02 -0.95737952146E+02 0.13361654188E+03 0.00000000000E+00 0. 1.
14 1 6 6 0 0 -0.90969697833E+02 -0.15564314195E+02 -0.16413695853E+03 0.18830473960E+03 0.00000000000E+00 0. -1.
5 1 5 5 501 0 0.17467765185E+02 -0.14358119410E+02 0.26662591415E+01 0.23248168736E+02 0.46999998093E+01 0. -1.
11 1 4 4 0 0 0.50813684997E+02 -0.61274565657E+02 -0.22556131392E+03 0.23919554619E+03 0.00000000000E+00 0. -1.
-12 1 4 4 0 0 0.59087134026E+01 0.24414494219E+02 -0.10983872989E+03 0.11267442925E+03 0.00000000000E+00 0. 1.
-5 1 3 3 0 503 0.92806006335E+02 0.12860335914E+02 -0.78847963972E+02 0.12254564240E+03 0.46999998093E+01 0. 1.
...
* Extending the LHE format in order to store partons, hadrons and jets (reco objects) in a same file. The generation step (hard-process, shower, reco) will be specified by the status-code. Some details:
----
* parton level: same conventions than the existing LHE. Example:
6 2 1 2 501 0 -0.14952840473E+03 0.23999735524E+02 -0.25720865153E+03 0.34516945021E+03 0.17335203433E+03 0. 0.
* hadron level: the conventions can be applied without too much change (maybe the meaning of the two ICOLUP variables could be discussed).
2212 2 1 2 501 0 -0.14952840473E+03 0.23999735524E+02 -0.25720865153E+03 0.34516945021E+03 0.17335203433E+03 0. 0.
* reco level: the conventions have to be adapted. We can keep:
* one line per physics object.
* ISTUP variable with specific PDG-id for reco objects. Example: 11/-11 for electrons, 13/-13 for muons, 15/-15 for hadronically-decaying taus, 22 for photons, 21 for jets, 12 for MET, -12 for MHT
* MOTHUP variables linked the reconstructed object to the originated partons (only for some objects). Example: a electron coming from hard-process.
* PUP variables without change.
Other relevant variables, specific to the nature of the reconstructed objects, must be added.
----
* Optional substructure in block: defining a XML tag for each collection of reconstructed objects. When detector fast-simulation is applied, several configurations can be applied (for instance for lepton isolation) and several collection of the same object kind can be produced. The -block substructure is designed to handle several collections of a same object kind. Example:
...
...
...
...
...
Warning: If the LHE file supplies several collections of jets, people must obviously use only collection of jets in their analysis.