Need help parallelising

Hello,

First, attached you find a working minimal example of my script, a rootfile here: https://gigamove.rz.rwth-aachen.de/d/id/2FoBfeMXNCWYMV.
I try to parallelize my analysis.
First I tried PROOF, but I need a lot of histograms, and each thread creates an own copy of the histograms, so I have not enough RAM to do it this way.
Then I tried OpenMP like in the example file in this post, but the problem is the way I save the data in a Class in the branch, as I get a segmentation violation at

EvClass *fEvent = new EvClass;

Sadly I have absolutely no idea how to write this with RDataFrame.

Is there a better way to save an unknown amount of Hits per Event? I got the idea with the Class from here. Also everytime I open the rootfile I get this warning:

Warning in <TClass::Init>: no dictionary for class EvClass is available
Warning in <TClass::Init>: no dictionary for class HitClass is available

What would you recommend to speed up the analysis?

xyplot.cpp (2.0 KB)

_ROOT Version: 6.22
_Platform: Ubuntu 20.04
Compiler: Not Provided


That’s unfortunate because this looks like the solution :slight_smile: So what do we do now - can you maybe give it a try, and you share in which parts you don’t succeed?

Hi,
if the problem is handling per-event vectors in RDF, there are a number of helpful topics on the forum, e.g.:

or in general you can search the forum: Search results for 'RDataFrame array' - ROOT Forum .

We also have a (small) section of the docs about ROOT: ROOT::RDataFrame Class Reference and a number of tutorials.

Looking at xyplot.cpp I think a relatively simple way to do something like that is to use a TH2I instead of one TH1I per channel (where the second dimension of the TH2 is for the different channels) and do something like:

df.Define("channels", getChannelArray, {"fEvent"})
  .Define("leads", getLeadsArray, {"fEvent"})
  .Define("reftimes", getRefTimeArray, {"fEvent"})
  .Fill(th2iHistogramModel, {"channels", "leads", "reftimes"});

(I have not tested the code but it should give you an idea).

Cheers,
Enrico

Thanks for the suggestion.

I tried to implement your suggestion to read the data out of the root file. You find the script and a small root file attached: scanxy.root (637.7 KB) xyplotrdf.cpp (1.7 KB)
Now the error is:

Error in <TTreeReaderValueBase::CreateProxy()>: The template argument type T of EvClass* accessing branch fEvent (which contains data of type EvClass) is not known to ROOT. You will need to create a dictionary for it.
Error in <TRint::HandleTermInput()>: std::runtime_error caught: An error was encountered while processing the data. TTreeReader status code is: 6

Then I tried to create a dictionary as described here and splitted the classes in these files: evclass.cxx (83 Bytes) evclass.hxx (253 Bytes) hitclass.cxx (65 Bytes) hitclass.hxx (169 Bytes) but then I only get

rootcling eventdict.cxx -c hitclass.hxx evclass.hxx
Warning: Unused class rule: evclass
Warning: Unused class rule: hitclass

What am I doing wrong?

Thank you for your help.

The user guide is a bit outdated. Your classes don’t have to inherit from TObject and you don’t have to use the ClassDef and ClassImpl macros. You do need a LinkDef.h file such as the following:

#ifdef __CLING__
// Standard preamble: turn off creation of dictionaries for "everything":
// we then turn it on only for the types we are interested in.
#pragma link off all globals;
#pragma link off all classes;
#pragma link off all functions;

// Turn on creation of dictionaries for nested classes (not needed for TwoInts but often part of the preamble)
#pragma link C++ nestedclasses;

#pragma link C++ class EvClass+;
#pragma link C++ class HitClass+;
#endif

With that, this command should work fine: rootcling -f eventdict.cxx xyplotrdf.cpp LinkDef.h (you don’t really need to split the class definitions from xyplotrdf.cpp if you don’t want to) and you should have dictionaries for those two classes.

Lastly, with RDF you don’t access data by pointer but by refrence (or value): you need to change getChannelArray to:

   auto getChannelArray = [&](EvClass &fEvent) {
      vector<int> channels;
      for (uint hit = 0; hit < fEvent.fHitArray.size(); hit++) {
         channels.push_back(fEvent.fHitArray[hit].ch);
      }

With that change root -l xyplotrdf.cpp gets me:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.