Four months ago Hive account@lemouth opened up particle physics research to the steemit community. The objective: have steemians produce meaningful contributions to particle physics by extending the MadAnalysis5 framework and its associated public analysis database. In this article we'll walk through the first 5 exercises of Hive account@lemouth's particle physics introduction. If you're familiar with modern C++ and unix development, it will take a weekend to come up to speed. After that, LeMouth's current task awaits.

Repositories

Madanalysis 5: https://github.com/BFuks/madanalysis-utopian
My Exercise Fork: https://github.com/iauns/madanalysis-utopian

First Exercise

Let's dive right into building MadAnalysis5 on the macOS platform. Instructions for getting up to speed are given here in the 'Getting Started' section. Beyond those instructions, here are a few notes of my own.

Building Root

Root's CERN website was down when I tried to download the latest version. Fortunately, there's an official github repository that contains root's source. If you use root's github repo, I'd suggest using homebrew to install cmake, then running git checkout with the v6-14-04 tag (latest production).

Build steps are given in root's github readme. The build goes smoothly with one modification: once you're in root's cmake build directory, the build command is cmake .. instead of cmake ../root.

The build takes roughly 30 minutes. Afterwards, if you type root and end up in the interpreter, exit by typing '.q'.

Be sure to source bin/thisroot.sh (inside the build directory). If you don't want to add it to your shell RC, source it in the shell you plan to do MadAnalysis5 work.

MadAnalysis 5

Once root's environment variables are properly set and you extract the MadAnalysis5 (MA5) tarball, installation of MA5 and associated modules proceeds without a hitch. Simply follow lemouth's instructions in the introductory post: ./bin/ma5, ma5> install delphes, ma5> install PAD.

If you're using MA5's Github Repo, you'll need to create the tools/SampleAnalyzer/Bin directory. If you do not, the ./bin/ma5 interpreter will fail and complain about a linker errer. For the fix, see this commit in my repo.

Note: There's no need to install homebrew GCC on macOS as the clang shim installed on macOS is sufficient. The command gcc exists on macOS, but /usr/bin/gcc on Apple platforms is actually clang in disguise. Clang's command line arguments are nearly identical to gcc's and /usr/bin/gcc is a tool shim (mentioned in the xcode-select man pages).

Second Exercise (ex1a)

The second exercise covers writing MadAnalysis5 code to count the number of objects in an event. As this is the first time diving into code, I'd recommend using Universal Ctags or some clang-based semantic analysis for code introspection. Note that some of the C++ files in MA5 have a .cc extension.

Per lemouth's article, I used ./bin/ma5 -E ex1a ex1a_iauns to generate my example ex1a directory.

Each recorded event is passed into the Execute function of our analysis code. The parameter event of type EventFormat contains all event data. Its definition is found in $(MA5_root)/tools/SampleAnalyzer/Commons/DataFormat/EventFormat.h. For this exercise we'll be interacting exclusively with RecEventFormat structure through the rec() accessor on EventFormat. Similarly, RecEventFormat is defined in $(MA5_root)/tools/SampleAnalyzer/Commons/DataFormat/RecEventFormat.h.

Instead of using size member variable of std::vector, I opted to use a range-based for loop to prep for upcoming exercises. The C++11 loop constructions are straight-forward:

bool test_analysis::Execute(SampleFormat& sample, const EventFormat& event)
{
  printf("\n");
  if (event.rec()!=0)
  {
    const std::vector<RecPhotonFormat>& photons   = event.rec()->photons();
    const std::vector<RecLeptonFormat>& electrons = event.rec()->electrons();
    const std::vector<RecLeptonFormat>& muons     = event.rec()->muons();

    size_t numPhotons   = 0; // photons.size();
    size_t numElectrons = 0; // electrons.size();
    size_t numMuons     = 0; // muons.size();
    for (const RecPhotonFormat& photon : photons)     { ++numPhotons; }
    for (const RecLeptonFormat& electron : electrons) { ++numElectrons; }
    for (const RecLeptonFormat& muon : muons)         { ++numMuons; }

    printf("  Num Photons: %zu\n", numPhotons);
    printf("  Num Electrons: %zu\n", numElectrons);
    printf("  Num Muons: %zu\n", numMuons);
  }
  return true;
}

For future reference, here are the shell steps to build the contribution while inside the Build directory in your test_project folder:

> source setup.sh   # Sets appropriate environment variables.
> make
> ./MadAnalysis5job ../Input/tth_aa.list # tth_aa.list is a plaintext file containing the absolute path to your data

Full ex1a source can be found here.

Third Exercise (ex1b)

The third exercise is where the physics gets more involved. While the prior two exercises can be done quickly, this one takes a bit more care. Section 5 of this research paper describes the object definition steps.

The most time-consuming part of this exercise was not implementing the code, but gaining a better understanding of the terminology and physics. I've included references below on the various quantities. In short, eta (η) represents pseudorapidity, and p_T represents transverse momentum. In the code, both pseudorapidity and momentum accessors are defined in ParticleBaseFormat base class.

After reading the article, it appears as though all of the thresholds are strictly less-than or strictly greater than (no equality). Using that assumption I've compared the following results to Hive account@irelandscape's and they are identical:

          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |            6 |            6 |            6 |            5
        => progress: [======>                            ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |            6 |            6 |            6 |            6
        => progress: [==========>                        ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            2 |            2 |            1 |            1
Muons     |            1 |            1 |            0 |            0
Jets      |            9 |            9 |            9 |            8
        => progress: [=============>                     ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            1 |            1 |            0 |            0
Jets      |           10 |           10 |           10 |            8
        => progress: [=================>                 ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |            8 |            8 |            8 |            7
        => progress: [====================>              ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            1 |            1 |            0 |            0
Jets      |            6 |            6 |            6 |            6
        => progress: [========================>          ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |            4 |            4 |            4 |            4
        => progress: [===========================>       ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            1 |            1 |            1 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |            6 |            6 |            5 |            5
        => progress: [===============================>   ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |           10 |           10 |           10 |            8
        => progress: [==================================>]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            1 |            1 |            1 |            1
Muons     |            0 |            0 |            0 |            0
Jets      |            7 |            7 |            6 |            6
        => progress: [===================================]
        => total number of events: 10 ( analyzed: 10 ; skipped: 0 )

I've used std::remove_if to make the code more concise. The only problem is that the return value of the lambda requires the reversal of inequalites from those used in the paper. In future exercises, I'll De Morgan the boolean expression to make the inequalities match those in the paper.

The relevant code can be found on github here.

A note on debugging: the makefile for MA5 expert mode is clean and simple. To debug with lldb or gdb, add -g and remove optimization -O3.

References

Pseudorapidity - Eta: η.
Transverse Momentum - p_t.
Pseudorapidity and Transverse Momentum
- Transverse Momentum.
- CMS Paper
Supersymmetry
ATLAS

Fourth Exercise (ex1c)

The fourth exercise involves object isolation and histogramming. We'll be focusing on CMS search for dark matter. Generating signal jets for this exercise is surprisingly simple compared to the last exercise. Instead of jets, we'll spend a bit more time generating signal photons.

Others have already built python code to generate histograms from SAS files. So there's no need to dive into that task request in this exercise. See the 'External addons' section on the MadAnalysis5 website.

The Loose, Medium, and Tight barrel photon identification requirements are given in Table 3, page 29, of this research paper, though only the Medium are required. After looking through irelandscape's ex1c posts and lemouth's comments, I figured out how to correctly isolate photons. The trick was subtracting off the photon's transverse momentum from I_gamma.

Here are the results using Hive account@crokkon's saf2png:

histos.n_jet.png

histos.pt_jet.png

histos.pt_photon.png

histos.n_photon.png

These match irelandscape's ex1c histograms. The relevant code:

bool ex1c_iauns::Execute(SampleFormat& sample, const EventFormat& event)
{
  if (event.rec() == nullptr) { return true; }

  Manager()->InitializeForNewEvent(event.mc()->weight());

  // Signal jets.
  std::vector<RecJetFormat> signalJets = event.rec()->jets();
  signalJets.erase(std::remove_if(signalJets.begin(), signalJets.end(),
    [](const RecJetFormat& jet) {
      return !(jet.pt() > 30.0f) || !(jet.abseta() < 5.0f);
    }), signalJets.end());

  // Isolated photons.
  const std::vector<RecPhotonFormat>& photons() const {return photons_;}
  std::vector<RecPhotonFormat> isolatedPhotons = event.rec()->photons();
  isolatedPhotons.erase(std::remove_if(isolatedPhotons.begin(), isolatedPhotons.end(),
    [&event](const RecPhotonFormat& photon) {
      // Reject on pseudorapidity and H/E.
      if (photon.abseta() >= 1.44f)    { return true; }
      if (photon.HEoverEE() >= 0.05f)  { return true; }

      // Isolation.
      MAfloat32 pt = photon.pt();
      double Igam = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::PHOTON_COMPONENT);
      double In = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::NEUTRAL_COMPONENT);
      double Ipi = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::TRACK_COMPONENT);

      if ((Igam - pt) >= (1.3f + 0.005f * pt))  { return true; }
      if (In >= (1.0f + 0.04f * pt))            { return true; }
      if (Ipi >= (1.5f))                        { return true; }

      return false; // Photon is isolated.
    }), isolatedPhotons.end());

  if (isolatedPhotons.size() > 0) {
    Manager()->FillHisto("pt_photon", isolatedPhotons[0].pt());
    Manager()->FillHisto("n_photon", isolatedPhotons[0].eta());
  }

  if (signalJets.size() > 0) {
    Manager()->FillHisto("pt_jet", signalJets[0].pt());
    Manager()->FillHisto("n_jet", signalJets[0].eta());
  }

  return true;
}

Full ex1c source can be found here.

References

Fifth Exercise (ex1d)

The fifth exercise utilizes our work from ex1c. A little less direction is given this time, so we're more reliant on the CMS paper. I'd recommend viewing the Public Analysis Database and reading through some of the implemented analyses. Since this project is about filling out this database, it's helpful to review prior implementations.

At first, I struggled with the introduction of cuts in this exercise. I didn't understand the rationale of introducing cuts versus filtering objects based on predefined criteria. However, after reading through Section 3 (Event Selection) of the CMS paper, it became clear that the application of cuts may reject all data from the event if not met, and the cut criteria does not necessarily include all objects of a given category. For example, one cut given in this exercise is: at least one photon must have (p_T > 175 GeV) within the fiducial region of the ECAL barrel. Even if an event passes this cut, we may still need to analyze photons that don't necessarily meet this cut criteria. Hopefully that understanding is correct.

Here's the adapted CMS code:

bool ex1d_iauns::Execute(SampleFormat& sample, const EventFormat& event)
{
  if (event.rec() == nullptr) { return true; }

  Manager()->InitializeForNewEvent(event.mc()->weight());

  // Signal jets.
  std::vector<RecJetFormat> signalJets = event.rec()->jets();
  signalJets.erase(std::remove_if(signalJets.begin(), signalJets.end(),
    [](const RecJetFormat& jet) {
      return !(jet.pt() > 30.0f) || !(jet.abseta() < 5.0f);
    }), signalJets.end());

  // Isolated photons.
  std::vector<RecPhotonFormat> isolatedPhotons = event.rec()->photons();
  isolatedPhotons.erase(std::remove_if(isolatedPhotons.begin(), isolatedPhotons.end(),
    [&event](const RecPhotonFormat& photon) {
      // Reject on pseudorapidity and H/E.
      if (!(photon.abseta() < 1.44f))    { return true; }
      if (!(photon.HEoverEE() < 0.05f))  { return true; }

      // Isolation.
      MAfloat32 pt = photon.pt();
      double Igam = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::PHOTON_COMPONENT);
      double In = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::NEUTRAL_COMPONENT);
      double Ipi = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::TRACK_COMPONENT);

      if (!((Igam - pt) < (1.3f + 0.005f * pt)))  { return true; }
      if (!(In < (1.0f + 0.04f * pt)))            { return true; }
      if (!(Ipi < (1.5f)))                        { return true; }

      return false; // Photon is isolated.
    }), isolatedPhotons.end());

  MALorentzVector pTmiss = event.rec()->MET().momentum();
  double MET = pTmiss.Pt();

  if (!Manager()->ApplyCut((MET > 170.), "MET > 170 GeV")) { return true; }
  if (!Manager()->ApplyCut((isolatedPhotons.size() >= 1) && (isolatedPhotons[0].pt() > 175.), "1+ photon 175 GeV pt")) { return true; }

  // foldr: If any of the 4 highest transverse momenta jets' minimum opening
  // angle between ptmiss is less than 0.5, reject the event.
  if (!Manager()->ApplyCut(std::accumulate(signalJets.begin(), std::next(signalJets.begin(), 4), true,
                      [&pTmiss](bool priorResult, const RecJetFormat& jet) {
                        return priorResult && (jet.dphi_0_pi(pTmiss) >= 0.5);
                      }),
                    "min dphi(ptmiss, ptjet) >= 0.5"))
  {
    return true;
  }

  // Search for signal photons.
  std::vector<RecPhotonFormat> signalPhotons = isolatedPhotons;
  signalPhotons.erase(std::remove_if(signalPhotons.begin(), signalPhotons.end(),
    [&pTmiss](const RecPhotonFormat& photon) {
      return !(photon.dphi_0_pi(pTmiss) >= 2.0);
    }), signalPhotons.end());


  if (!Manager()->ApplyCut(signalPhotons.size() > 0, "1+ photon")) { return true; }

  RecPhotonFormat candidatePhoton = signalPhotons[0];

  // Lepton veto
  std::vector<RecLeptonFormat> electrons = event.rec()->electrons();
  std::vector<RecLeptonFormat> muons = event.rec()->muons();
  auto leptonVetoPredicate = [&candidatePhoton](const RecLeptonFormat& lepton) {
    return (lepton.pt() > 10.0f && lepton.dr(candidatePhoton) > 0.5);
  };
  bool leptonVeto =    std::any_of(electrons.begin(), electrons.end(), leptonVetoPredicate)
                    || std::any_of(muons.begin(), muons.end(), leptonVetoPredicate);
  if (!Manager()->ApplyCut(leptonVeto, "Lepton veto")) { return true; }

  // Update histogram
  if (isolatedPhotons.size() > 0) {
    Manager()->FillHisto("pt_photon", candidatePhoton.pt());
    Manager()->FillHisto("n_photon", candidatePhoton.eta());
  }

  if (signalJets.size() > 0) {
    Manager()->FillHisto("pt_jet", signalJets[0].pt());
    Manager()->FillHisto("n_jet", signalJets[0].eta());
  }

  return true;
}

You'll notice in the code that I use std::accumulate. Semantically it is similar to Haskell's foldl.

I'm much less certain about this exercise. Particularly my handling of signal photons and if I should be using JetCleaning. I will need data to test these calculations beyond the MET > 170 GeV cut.

Full ex1d source can be found here.

Conclusion

I ran through all 5 exercises in about 2 full days. So if you're looking to jump in and are familiar with C++, the time investment is pretty reasonable. The most time consuming process is familiarizing yourself with the physics behind the programming. Moving this quickly would not have been possible without all of the legwork and questions effofex, irelandscape and others asked. It's useful to read through their interaction with Hive account@lemouth when coming up to speed. I'd recommend not looking at anyone's code until you've written and tested your own.

The progression of difficulty is pretty linear: 1a, 1b, 1c, and 1d. This is due to subteties in 1c (subtraction of photon transverse momentum from Igamma), and lack of direction in the exercise for 1d which forces reliance on the CMS paper. Also, it wasn't clear that the incoming event objects were sorted in any fashion until I found a comment to that effect -- though I'm still not entirely clear on the sorting criteria, I believe they are sorted by decreasing transverse momenta.

To fully appreciate the code we're writing, a better understanding of the physics is required. To those entering, I'd suggest throwing yourself into the physics as much as possible.

For the future, I'd like to be able to fabricate datasets to test code. I'm hoping this will come up during the verification phase.

In the interest of moving quickly, I kept the details and code discussion to a minimum.

Thanks for reading.

Utopian Particle Physics In A Weekend

Repositories

First Exercise

Building Root

MadAnalysis 5

Second Exercise (ex1a)

Third Exercise (ex1b)

References

Fourth Exercise (ex1c)

References

Fifth Exercise (ex1d)

Conclusion