====== ML reinterpretation ======

==== 26 June ====

{{:2023:​groups:bsmtools:mlreint26jun.png?400|}}

==== 27 June ====

{{:2023:​groups:bsmtools:mlreint27jun_1.jpg?400|}}

{{:2023:​groups:bsmtools:mlreint27jun_2.jpg?400|}}

===== Full analysis recasting =====

Standards for sharing models:
  * [[https://github.com/lwtnn/lwtnn|LWTNN]]
  * [[https://onnx.ai/|ONNX]]

Analyses that have provided ML models:

  * [[https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2019-04/|ATLAS-SUSY-2019-04]]
  * [[https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2018-30/|ATLAS-SUSY-2018-30]]
  * ?

Discussion during the Dec '22 reinterpretation workshop: [[https://indico.cern.ch/event/1197680/timetable/?view=standard#b-485872-experience-and-feedba|link]].

[[https://www.overleaf.com/8811915719zfjtnygcdgpv|Overleaf document for writeup]]

===== Surrogate models for object tagging =====


Propose to build a surrogate model using the JetClass dataset [1], trying to approximate the output of a state-of-the-art attention based tagger (ParT, [1]) -- which uses low-level inputs including vertex information -- with a network only using high-level kinematics / n-subjettiness.
Hamburg is preparing a simplified dataset (dropping low level features, adding ParT output, restricting to hadronic top vs light quark/gluon;  reducing examples/class to  2M train / class; 500k test/class; 1M  val/class)


Based on this we can test different surrogate models, Bayesian NN, explicit sampling.


[1] Paper that introduced jetClass data: [[https://arxiv.org/abs/2202.03772 | arXiv:2202.03772]]