Multithreading safety when building custom object in RDF Define()

Hi!

I was build a custom class to store the results of an analysis done with RDataFrame, but I’m encountering a lot of issues when working on ImplicitMT mode, and I suspect I’m missing some kind of MT adaptations in my class.

First thing, I’m reading a TTree with custom objects holding the calibrated detector data. Then, I’ve developed this E748Analysis class (see the codes in the .zip attached) whose aim is to perform a sort of high-level and simple analysis (build beam energy, reconstruct angles and get excitation energy of the heavy nucleus) but it relies on the detector data, so I’m constructing it on top of those data.

You can find the Test() macro in the .zip, but here is a snippet of the MWE:

  ROOT::RDataFrame d ("PhysicsTree", "/media/Data/E748/output/analysis/Beam_12Be.root");

  auto df = d.Define("X", [](const TCATSPhysics& cats){return ROOT::RVecD(cats.PositionX);}, {"CATS"});
  df = df.Define("Y", [](const TCATSPhysics& cats){return ROOT::RVecD(cats.PositionY);}, {"CATS"});
  df = df.Define("Z", [](const TCATSPhysics& cats){return ROOT::RVecD(cats.PositionZ);}, {"CATS"});

  df = d.Define("Ana", [&](TCATSPhysics& cats, double T_CATS1_CAV_Cal)
  {
      E748Analysis ana {};
      //ana.SetInterface(&interface);
      //if(slot == 1)
      ana.BuildBeam(cats, T_CATS1_CAV_Cal);
      return ana;
  },
          {"CATS", "T_CATS1_CAV"});

Basically, CATS is a branch holding a TCATSPhysics object (from an external library, here); NPToolsInterface is a class just holding smart pointers to classes from that library that I need to perform some computations and E748Analysis will contain the important results.

To simplify things, I’ve reduced the code of the method E748Analysis::BuildBeam() to the lines I’ve
identified as the source of the issue: I’m just initializing a XYZPoint from a std::vector<double> contained in the TCATSPhysics data, but that raises segmentation faults and the program halts. But here comes the funniest thing: if I disable ImplicitMT(), nothing happens and everything works fine! So I’m suspecting I need to introduce adaptations in my E748Analysis class to multithreading? Or maybe the original class (TCATSPhysics) isn’t well-suited for MT?

Also, reading the docs I found the DefineSlot() funtion, so I decided to give it a try. And what happens is that when selecting only one slot, no errors are raised! Hence my suspicions on the MT design of the classes. A workaround is to create new columns with the interesting fields in TCATSPhysics and then work with them instead of the whole object, but I’m concerned that there is a deeper problem.

Any help would be highly appreciated! Thanks

Attached files: RDF description and header/imp files for my classes.
Analysis.zip (7,2 KB)


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.28/02 (precompiled binary)
Platform: Ubuntu 22.04


Hi @loopset,

thank you for your question. Maybe @vpadulan could take a look?

Cheers,
Marta

Hi @loopset ,

definitely at least one of these two. The snippet you show and the code you highlight from E748Analysis::BuildBeam() doesn’t seem it should produce errors per se, maybe it’s somewhere else but the fact you are not seeing errors when using DefineSlot is enough to know some extra adaptation is needed for the multithreaded case. As a first step, you could try implementing E748Analysis in a way that different TCATSPhysics data may be accessed by the different slots, maybe storing them in a vector.

Cheers,
Vincenzo

Okey, that solution is not that different from creating columns with the desired members of TCATSPhysics and using them to initialize my class.

Thanks for the suggestion, I will keep you posted with any updates!

Cheers