A Branch for TObject derived class that contains very large array of other TObject derived class

Hi ,
I have two TObject derived classes A and B,
Class B contains a variable length array of object A (lets say A_arr).
The dimension of array A_arr is in order of 10^5 (maximum being 5 million).
What is the I/O efficient way of writting TBranch holding object B.
Should I use TClonesArray or TList or something else for A_arr.

I recommend you use a TClonesArray (or a std::vector) so that the branch can be split.

Cheers,
Philippe.

Hi,
So I used

class B : public TObject 
{ 
  ....
  TClonesArray* tca //->
  ....
}
default constructor is then,
B:: B()
{
 tca = new TClonesArray("A",5000000) ;
}

Now I am reading a data generated by some other simulation software and converting it to TTree by having a tree that has a branch of B. When I run the code for conversion, (i.e. reading the event data, filling it to tree) after reading some events the process is killed by PC. My RAM is 8 GB and swap is 8 GB.
I have set the tree->SetMaxVirtualSize(1000000000) and tree->SetMaxTreeSize(1500000000).
I am using ROOTv5.34/14 .

Sorry. I missed a point before. The simulation software has it’s own customized data format along with its I/O software. The maximum size of the single event (assuming all the TClonesArray is exhausted) is ~1.5 GB. I/O software thus creates its own buffer of ~2 GB in memory while reading (or writting) the events.

Hi,

Can you provide the way you created the TFile and TTree (the TTree may not be correctly attached to the TFile).

Cheers,
Philippe.

Ok. The code is quite big. but I can tell you in brief what it is doing.
The event data is read in terms of i/o blocks where i/o blocks have different “types” marked by the identifier at the start of each block. So I am giving the pseudocode (and the explicit way in which TFile and TTree are created).

// open input file..
// loop over i/o blocks
  {
    switch (i/o block type) {
           case <RUNHEADER-TYPE> :
                   ....
              rootFile = new TFile(name,"CREATE") ;
              dataTree = new TTree(name,title) ;
              dataTree->SetMaxTreeSize() ;
              dataTree->SetMaxVirtualSize(1GB) ;
                   
              dataTree->Branch("DataHeaderClass" ..., 32000) ;
              dataTree->Branch("DataClass",..., )  // DataClass is class B as above
              break ;
          case <DATAHEADER-TYPE> :
              // Read into DataHeaderClass object
          case <DATA-TYPE> :
              // Update DataClass object (i.e. object of class B)
                 dataTree->Fill() ; 
           case <RUNE>      :
              
       }
    
  }
  rootFile = dataTree->GetCurrentFile() ;
  rootFile-> Write(0,TObject::kOverwrite) ;
  rootFile -> Close() ;

Hi,

Okay, that looks fine. Can you give the result of TTree::Print on the TTree in a state as full as possible (i.e. as shortly before the crash as possible)?

Thanks,
Philippe.

Hi,
Previously I was getting an error from reading part (which got lost in lot of screen printing) which I have removed now. This also stops the crash problem. However somehow root file writting is extremely slow now. Also TClonesarray does not look sensible. I saved only 5000 events to .root file. It took around more than minute. Reading part takes only around 30 seconds (for complete file that contains 10^6 events). Also input file is 2.1 GB (for 10^6) events while .root file is already 2.2 GB for 5000 events.
I am attaching the output log of the tree->Print().
log.txt (13.9 KB)

Hi,
I am attaching the tarball of the code (Since I could not create a smaller reproducible).
The problem is from the files include/MSimIOCorsika.h, src/SimCorsika/MSimIOCorsika.cxx and src/Mains/Rootify.cxx.

Hi,
Solved the problem.
For memory crash, problem was that my object initialization was not after file opening.
And For slow writting/large files, I had to shrink the TClonesArray after every event reading (using ExpandCreateFast), since very large arrays are very rare (power law distribution events).