Writing and Reading (vector or RVec) MyClass with RDataFrame

zhubacek · May 21, 2021, 9:08am

Dear experts,
I’m trying to create a vector (or RVec) of my custom class (MyClass which derives from TObject) and save it/read it with RDataFrame.
After trials and errors I managed to create the dictionary and it seems to work, however I have a question about how the output file should look like:
it seems that the output collection in the TTree is not vector<MyClass> (I also tried RVec<MyClass> but an array

Int_t MyClass_;
UInt_t MyClass_fUniqueID[kMaxMyClass];
UInt_t MyClass_fBits[kMaxMyClass];
Double_t MyClass__property1[kMaxMyClass];
Double_t MyClass__property2[kMaxMyClass];
...

this structure is shown in all TTree->Print(), TTree->MakeClass("...") or RDataFrame->GetColumnNames()
In my case kMaxMyClass is fixed/the same in every event (e.g number of systematics) so it is not a problem.

In any case, this is my LinkDef.h:

#ifdef __CLING__
#pragma link off all globals;
#pragma link off all classes;
#pragma link off all functions;
#pragma link C++ nestedclasses;

#pragma link C++ defined_in "include/MyClass.h";

#pragma link C++ namespace MyClass;

#pragma link C++ class MyClass::MyClass+;
#pragma link C++ class std::vector<MyClass::MyClass>+;
#pragma link C++ class ROOT::VecOps::RVec<MyClass::MyClass>+;
#endif

Thanks,
Cheers,
Zdenek

ROOT Version: 6.22.02
Platform: CentOS7
Compiler: gcc 8.3.0

eguiraud · May 21, 2021, 9:12am

Hi @zhubacek ,
I suggest using vector<MyClass> over RVec<MyClass> as the persistified type until ROOT v6.25 (the current development branch), where we improved I/O of RVec and made it as fast as I/O of std::vector.

ROOT performs some optimization when writing vector<T> to file, e.g. it splits the data members of the class in separate branches for better compression and faster partial reading, which I think is what you see there. You should nevertheless be able to read the full vector<MyClass> back, is this not the case?

Cheers,
Enrico

zhubacek · May 21, 2021, 9:30am

Hi @eguiraud ,
Yes, I seem to be able to read both vector<MyClass> and RVec<MyClass> correctly.
I didn’t check the performance between them however I need to decide which one to use, because they can’t be used interchangeably:
I don’t see a difference between them with TTree->Print(), TTree->MakeClass(..) or RDataFrame->GetColumnNames(), however there must be a difference internally:
because when I write RVec<MyClass> I can’t read it as vector<MyClass>:

Error in <TTreeReaderValueBase::CreateProxy()>: The branch MyClass contains data of type vector<MyClass::MyClass,ROOT::Detail::VecOps::RAdoptAllocator<MyClass::MyClass> >. It cannot be accessed by a TTreeReaderValue<vector<MyClass::MyClass>>

Would there be any performance concern if the MyClass would have many parameters because then the RDataFrame->GetColumnNames() would be a (necessary?) long list?

Cheers,
Zdenek

eguiraud · May 21, 2021, 9:34am

Hi,
in v6.25 RVec and std::vector I/O is actually interchangeable (you can write one and read as the other), but that’s not the case yet in the latest stable version. So with v6.22 or v6.24, I suggest using std::vector as the persistified type.

A large number of columns should not bring any visible performance cost.

Cheers,
Enrico

zhubacek · May 21, 2021, 11:28am

Ok, thanks!
I think I will maybe stay with RVec already now as I might use its features
Cheers,
Zdenek

eguiraud · May 21, 2021, 11:46am

Sure. With RDataFrame everything should work fine. Before v6.26 (i.e. currently) it might not be possible to write RVecs with RDataFrame and read them back as RVec with other interfaces, but you should be able to read them back e.g. as std::vectors with any ROOT interface.

Let us know in case you encounter any problem.
Cheers,
Enrico

system · June 4, 2021, 11:47am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.