Reading TRefArray with RDataFrame in PyROOT

Hi,

I’m faced with a problem reading Constituents TRefArray (data member of Jet) from a Delphes root file.

There is no error when running the code at the bottom.

However, most Constituents are nullptr and rarely Muon. And Muon has always same address.

They should be Track or Tower.

I also checked a file with TTree and it’s okay.

It seems the deserialization of TRefArray is wrong. (I’m not sure. :frowning: )

I attached short scripts at the bottom.

And if you want to reproduce my problem, here is my github repo containing full codes and a small ROOT file.

Note that you cannot find Muon from this small file because a file is too small.

If you wanna a bigger file, please tell me.

Cheers,
Seungjin

Tools.hh

Float_t testJetCon(const TRefArray & constituents) {
  Float_t out = 0.0f;

  unsigned int num_nullptr = 0;
  unsigned int num_track = 0;
  unsigned int num_tower = 0;
  unsigned int num_wrong = 0;


  for (int idx = 0; idx < constituents.GetEntries(); idx++) {
    const TObject* con = constituents.At(idx);
    if (con == nullptr) {
      num_nullptr++;
      continue;
    }

    if (auto track = dynamic_cast<const Track*>(con)) {
      out += track->PT;
      num_track++;
    } else if (auto tower = dynamic_cast<const Tower*>(con)) {
      out += tower->ET;
      num_tower++;
    } else {
      num_wrong++;
      std::cerr << "[[WRONG]] " << con->ClassName() << " found" << std::endl;
    }
  }

  std::cout << "nullptr: " << num_nullptr
            << ", Track: " << num_track
            << ", Tower: " << num_tower
            << ", Wrong: " << num_wrong
            << std::endl;


  return out;
}

test-RDataFrame.py

ROOT.gInterpreter.AddIncludePath('Delphes-3.4.2/classes/')
ROOT.gInterpreter.AddIncludePath('Delphes-3.4.2/external/')
ROOT.gSystem.Load("Delphes-3.4.2/libDelphes.so")
ROOT.gInterpreter.Declare('#include "Tools.hh"')

df = ROOT.RDataFrame('Delphes', 'delphes.root')
df = df.Define('test_var', 'testJetCon(Jet.Constituents[0])')
df = df.Filter('test_var > 0.0')
print(df.Count().GetValue())

Please read tips for efficient and successful posting and posting code

ROOT Version: 6.18/04
Platform: CentOS Linux release 7.7.1908 (Core)
Compiler: gcc (GCC) 8.3.1
Python Version: Python 2.7.15+
Delphes Version: 3.4.2


Hi @seungjin.yang,
welcome to the ROOT forum and thank you for your thorough report!

RDataFrame leverages TTreeReader for the deserialization of ROOT objects under the hood. In order to speed up resolution of your issue, could you try and read Jet.Constituents using TTreeReader and a TTreeReaderValue<TRefArray>?

It should be something like (not tested, but it should give you an idea):

TFile f("filename.root");
TTreeReader r("treename", &f);
TTreeReaderValue<TRefArray> rv(r, "Jet.Constituents");
while (r.Next())
  testJetCon(*rv);

Cheers,
Enrico

Hi @eguiraud

I just replaced TTreeReaderValue with TTreeReaderArray in your snippet.
It give the same result as the RDataFrame. (no segfault but nullptr problem)

TFile f("filename.root");
TTreeReader r("treename", &f);
TTreeReaderArray<TRefArray> rv(r, "Jet.Constituents");
while (r.Next()) {
  for (const auto & each : rv) {
    testJetCon(each);:
  }
}

Cheers,
Seungjin

Hi,
thank you for making this experiment. If TTreeReader reads TRefArray silently wrong, that’s a pretty bad issue. @pcanal , @Axel , what’s your take?

Cheers,
Enrico

Yes, bad bug. I created https://sft.its.cern.ch/jira/browse/ROOT-10559

Thank you :slight_smile:

I’m really looking forward to RDataFrame + Delphes.

Cheers

Seungjin

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Hi,
please take a look at https://sft.its.cern.ch/jira/browse/ROOT-10559 when you have a second