Writing a TTree with a complex structure (e.g. Delphes TTrees) using RDataFrames

Hi all,

I want to open a TTree created by Delphes with RDataFrames, remove events based on some cuts and then write the remaining events to a new TTree that has exactly the same format as the previous one. The first part works now thanks to this thread, but now I run into troubles saving.

When I process delphestestsample.root with the below program, the outputfile doesn’t have the same structure – in particular, I’d expect Jet to have sub-branches, see the attached picture. The in- and output files are also attached.

I’ve also tried to simply write all columns using df.GetColumnNames(), but then I run into the type issues I had in the linked thread.

How does one write a TTree with a complex structure such as the ones created by Delphes to disk with RDataFrames?

Cheers,

Jonas

#include <classes/DelphesClasses.h>
#include <ROOT/RDataFrame.hxx>
#include <ROOT/RSnapshotOptions.hxx>
#include <ROOT/RVec.hxx>
#include <TFile.h>

bool JetPtCut(ROOT::VecOps::RVec<Jet> jets)
{
  if (jets.size() > 1 && jets.at(1).PT > 100) {
    return true;
  }

  return false;
}

int main(int argc, char* argv[])
{
  auto df = ROOT::RDataFrame("Delphes", "delphestestsample.root")
    .Filter(JetPtCut, {"Jet"});

  df.Snapshot<ROOT::VecOps::RVec<LHEFEvent>, ROOT::VecOps::RVec<Electron>, ROOT::VecOps::RVec<Muon>, ROOT::VecOps::RVec<Jet>>
    ("Delphes", "reducedsample.root", //df.GetColumnNames(),
     {"Event", "Electron", "Muon", "Jet"}, // try explicitly, first method doesn't seem to work
     ROOT::RDF::RSnapshotOptions("RECreate", ROOT::kZLIB, 1, 0, 99, false));

  return 0;
}

ROOT Version: 6.20.04
Platform: Linux 5.6.11-200.fc31.x86_64
Compiler: g++ 9.3.1


reducedsample.root (5.9 KB) DelphesCutter.cxx (793 Bytes) delphestestsample.root (249.8 KB)

Hi @jndrf,
thanks for the thorough report with a file that we can use for debugging. If I understand correctly, the problem is that in the input file Jet has a series of sub-branches, but in the output file it somehow became just a leaf (similarly for other branches).

I will take a look as soon as possible, probably over the course of next week.
Cheers,
Enrico

Hi,
sorry for the delay, I am having some difficulties setting up Delphes to reproduce the problem locally, I’ll get back to you as soon as possible.

Cheers,
Enrico

Hello,
the problem is that TClonesArray are not well supported. I will try to improve the situation (certainly we must not silently write wrong data) but, in the meanwhile, it seems that telling Snapshot that these branches are in fact TClonesArrays (rather than the catch-all RVec) solves the problem:

#include <classes/DelphesClasses.h>
#include <TClonesArray.h>
#include <ROOT/RVec.hxx>
#include <TFile.h>
#include <ROOT/RDataFrame.hxx>

bool JetPtCut(ROOT::VecOps::RVec<Jet> jets)
{
  if (jets.size() > 1 && jets.at(1).PT > 100) {
    return true;
  }

  return false;
}

int main(int argc, char* argv[])
{
  auto df = ROOT::RDataFrame("Delphes", "delphestestsample.root")
    .Filter(JetPtCut, {"Jet"});

  df.Snapshot<TClonesArray, TClonesArray, TClonesArray, TClonesArray>(
     "Delphes", "reducedsample.root",      // df.GetColumnNames(),
     {"Event", "Electron", "Muon", "Jet"}, // try explicitly, first method doesn't seem to work
     ROOT::RDF::RSnapshotOptions("RECreate", ROOT::kZLIB, 1, 0, 99, false));

  return 0;
}

It would be great if you could confirm this works.

EDIT: in particular, please check whether the values written out are correct. work in progress…

Cheers,
Enrico

P.S.
this is now https://sft.its.cern.ch/jira/browse/ROOT-10792

Hi again,
if I understood the problem correctly, you have 3 options (1 and 2 are currently broken in different ways, but fixed by the PR I link below):

  • call Snapshot<RVec<Jet>>(..., {"Jet"}): this tells Snapshot to read Jet as an RVec rather than a TClonesArray. At this point, we cannot easily write it out as a TClonesArray, but instead we would write a std::vector<Jet>. You will require dictionaries for std::vector<Jet>
  • call Snapshot(..., {"Jet"}): this should just work and write out a TClonesArray with the fix linked below
  • call Snapshot<TClonesArray>(..., {"Jet"}): this should already work, and it will keep working

Does this sound reasonable? If yes, is there any chance you can try whether this PR fixes your issue?

Cheers,
Enrico

Hi Enrico,

thank you for the reply. I am using the last method from your list for now, this works.
On a quick glance, the contents of the file look sensible (at least the cut is applied correctly), but I haven’t made a thorough check.

I don’t have a self compiled version of ROOT at hand right now, but I think I can get around to try the PR this week.

Cheers,

Jonas

1 Like

Hi Jonas,
just so you know, the next release v6.22 and the next v6.20/06 patch release will contain the PR that I linked above, which makes Snapshot(..., {"Jet"}) work out of the box and Snapshot<RVec<Jet>>(..., {"Jet"}) either work (if dictionaries are present) or error out noisily with a (hopefully) helpful error message. Note that Snapshot<RVec<Jet>>(..., {"Jet"}) will write out a vector<Jet>, while the other two methods will write out TClonesArrays.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.