Slow Read/Write of TFile

devju · January 4, 2016, 6:31pm

Dear everybody,

I am currently working on a project where we use monte-carlo simulations. These simulations are first stored in a TBranch of a TTree and a corresponding TFile, and then later on retrieved again to perform several computations on them. The data structures we use consist of encapsulated nested STL containers (mostly vectors) with pointers to an abstract base-class, where the structures can have several layers.
However, so far we have experienced a rather slow write access of the TFile. When I turn compression off, the run of a simulation with 1e6 entries results in a write speed of ~40 MBytes/s.

I am just a ROOT novice, but I am guessing that the data structure is the limitation here, so I would like to know if it is better performance-wise to implement the data structure in terms of templates instead of a class hierarchy with an abstract base class. Also, is the read/write access faster if we use ROOT collection classes rather than STL containers? All this of course with respect to the read and write access of a TTree in a TFile.

Many thanks,

Simon

pcanal · January 4, 2016, 7:04pm

Hi Simon,

Without seeing the details of you data model, I can only give you generic advice. The associative collection (set and map) are significantly slower than std::vector and TClonesArray as the associative collection need to rebuild at read-time (so paying again the insert cost) ; an alternative is to use a sorted vector of pairs.
Each level of inheritance is indeed costing time to process (and space on disk) unless they are stored in a split collection (ie. a TClonesArray or a collection of object).
Pointers also slow down the processing as the pointee’s type (usually) can change from entries to entries and thus need to be allocated/deallocated. Pointers also prevents the splitting.

Cheers,
Philippe.