Snapshot of RVec<RVec<T>> produces broken ROOT file

clementhelsens · March 11, 2021, 8:18am

if I may ask on this topic, somehow related question, I managed to store a ROOT::VecOps::RVec<ROOT::VecOps::RVec<>> in my output files, but even with properly generating the specific dictionary in my LinkDef.h I am never able to read it back with root (v06.22) without a seg fault. Using std::vector<std::vector<>> works just fine as expected. Is this an expected behaviour?
Cheers,
Clement

eguiraud · March 11, 2021, 9:02am

Hi,
how did you write RVecs out and what does tree->Print() show for that output TTree?

I/O of RVecs if a bit flaky currently, that’s why e.g. RDataFrame does not write out RVecs but always std::vectors, performing an on-the-fly conversion if needed. We are going to fix it in a backward-incompatible way in the next ROOT version, so writing RVecs to disk at the moment is discouraged (v6.24 will print a warning).

clementhelsens · March 11, 2021, 10:10am

Oh, I see. I tried as much as possible use RVec everywhere even writing them in the TTree (I build a dictionary of C++ functions that returns RVecs). I should then change to std::vectors following your statement.

Below is a print of the ttree where you see one output RVec that works: jets_kt_pz my vector<vector> that also works jetconstituents_kt but the RVec of RVec of int jetconstituents2_kt does not.

*............................................................................*
*Br   17 :jets_kt_pz : vector<float,ROOT::Detail::VecOps:                    *
*         | :RAdoptAllocator<float> >                                        *
*Entries :    10000 : Total  Size=     381827 bytes  File Size  =     261844 *
*Baskets :       14 : Basket Size=      32000 bytes  Compression=   1.46     *
*............................................................................*
*Br   18 :jetconstituents_kt : vector<vector<int> >                          *
*Entries :    10000 : Total  Size=    3491564 bytes  File Size  =    1294513 *
*Baskets :      113 : Basket Size=      32000 bytes  Compression=   2.70     *
*............................................................................*
*Br   19 :jetconstituents2_kt : Int_t jetconstituents2_kt_                   *
*Entries :    10000 : Total  Size=      82397 bytes  File Size  =      13133 *
*Baskets :        5 : Basket Size=      32000 bytes  Compression=   6.13     *
*............................................................................*
*Br   20 :jetconstituents2_kt.fData : vector<int,ROOT::Detail::VecOps:       *
*         | :RAdoptAllocator<int> > fData[jetconstituents2_kt_]              *
*Entries :    10000 : Total  Size=     342201 bytes  File Size  =      22440 *
*Baskets :       14 : Basket Size=      32000 bytes  Compression=  15.22     *
*............................................................................*

I have copied the file in my public afs in case you want to take a look at it:
/afs/cern.ch/user/h/helsens/public/4ROOT/events_196309147.root

Cheers,
Clement

eguiraud · March 11, 2021, 10:13am

The file looks fine, those are actually all std::vectors (with a custom allocator, but ROOT I/O does not care) – probably thanks to that on-the-fly conversion that Snapshot does.

Thank you for sharing the file, can you please also share a snippet of code that crashes when reading it?

Cheers,
Enrico

clementhelsens · March 11, 2021, 10:23am

It crashes just by doing events->Scan("jetconstituents2_kt").

And following up on storing RVec, could you confirm that I should change to std::vector?

Cheers,
Clement

eguiraud · March 11, 2021, 10:29am

From what you posted above, you are not storing RVecs, you are storing std::vectors (probably thanks to Snapshot performing this on-the-fly conversion I mentioned). I will check the actual file and get back to you.

clementhelsens · March 11, 2021, 10:33am

yes, I understood that, sorry for not being clear enough. I’m wondering if I should avoid using the OTF conversion by directly using std::vectors, or would that have a negligible impact on performance?

eguiraud · March 11, 2021, 10:51am

In 6.22, the conversion has zero overhead. In 6.24, the conversion will have a little overhead. In 6.26, there will be no conversion and I/O of RVecs will work just fine (even better – you will be able to write std::vectors and read them as RVecs and vice-versa with pure ROOT I/O).

Bottom line: sorry for the trouble, we are making things better, and I will let you know why TTree::Scan crashes

eguiraud · March 12, 2021, 6:07pm

Hi @clementhelsens ,
I checked your file and indeed the nested RVecs are causing issues: Snapshot is only converting the outer RVec to a std::vector, and the inner RVec is still being written to file – and then you encounter precisely the kind of RVec I/O issues that we are fixing these days, e.g. in [VecOps] Use collection proxies for RVec I/O by eguiraud · Pull Request #7232 · root-project/root · GitHub .

Indeed in this case I would suggest to use std::vectors instead, as Snapshot is not doing it for you.

Sorry for the trouble, let me know if this “solves” it.

Cheers,
Enrico

clementhelsens · March 13, 2021, 7:53am

Thanks @eguiraud for looking into it and very good to hear that this is already in the pipeline.
I will be using std::vector of std::vectors in the mean time. Issue understood, so I mark it with solution.
Cheers,
Clement

eguiraud · March 18, 2021, 11:24am

Hi @clementhelsens ,
double-checking, you are generating dictionaries for your nested collection types right?

clementhelsens · March 18, 2021, 11:35am

Yes, I always add to the LinkDef explicitly when it’'s not standard:

github.com

HEP-FCC/FCCAnalyses/blob/master/analyzers/dataframe/LinkDef.h

#ifdef __CINT__

//Globals
#pragma link off all globals;
#pragma link off all classes;
#pragma link off all functions;
#pragma link C++ nestedclasses;

//Dictionaries for output objects
#pragma link C++ class std::vector<TLorentzVector>+;
#pragma link C++ class std::vector<std::string>+;

#pragma link C++ class ROOT::VecOps::RVec<TLorentzVector>+;
#pragma link C++ class std::vector<std::vector<int>>+;
#pragma link C++ class ROOT::VecOps::RVec<edm4hep::TrackState>+;
#pragma link C++ class ROOT::VecOps::RVec<edm4hep::VertexData>+;
#pragma link C++ class ROOT::VecOps::RVec<edm4hep::ReconstructedParticleData>+;
#pragma link C++ class ROOT::VecOps::RVec<edm4hep::Vector3d>+;
#pragma link C++ class ROOT::VecOps::RVec<edm4hep::MCParticleData>+;

This file has been truncated. show original

eguiraud · March 18, 2021, 11:40am

Alright, I will soon have a test in for I/O of RVec<RVec<T>> and RVec<RVec<RVec<T>>> (just to be sure), where T is a fundamental type. Your LinkDef reminds me that it’s better to also test the case in which T is a user-defined class.

Anyway, I will ping you here when ROOT’s master branch has support and test coverage for these cases, thank you for bringing this up.

system · April 1, 2021, 11:41am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

eguiraud · April 6, 2021, 3:29pm

Hi @clementhelsens ,
fyi I just merged a test for Snapshot + nested RVecs, i.e. it should work in current master and it should not break in the future.

Cheers,
Enrico

clementhelsens · April 13, 2021, 6:39pm

Thanks @eguiraud , will give it a try as soon as I can get it within key4hep stack.

eguiraud · April 20, 2021, 8:00am

This topic was automatically closed after 13 days. New replies are no longer allowed.