Huge output file from simple event model

Hello,

I’m having an issue where the ROOT files I’m writing from a simple set of event model classes are much, much larger than I would expect. About 50K events ends up being around 13 gig!

The generated StreamerInfo looks reasonable e.g.

StreamerInfo for class: event::SimTrackerHit, version=1, checksum=0xc8ac5521
  TObject        BASE            offset=  0 type=66 Basic ROOT object
  int            id_             offset= 16 type= 3
  int            layerID_        offset= 20 type= 3
  float          edep_           offset= 24 type= 5
  float          time_           offset= 28 type= 5
  float          px_             offset= 32 type= 5
  float          py_             offset= 36 type= 5
  float          pz_             offset= 40 type= 5
  float          x_              offset= 44 type= 5
  float          y_              offset= 48 type= 5
  float          z_              offset= 52 type= 5
  float          pathLength_     offset= 56 type= 5
  TRef           simParticle_    offset= 64 type=61
   i= 0, TObject         type= 66, offset=  0, len=1, method=0
   i= 1, id_             type= 23, offset= 16, len=2, method=0 [optimized]
   i= 2, edep_           type= 25, offset= 24, len=9, method=0 [optimized]
   i= 3, simParticle_    type= 61, offset= 64, len=1, method=0

So it should only need about 65 bytes to store an object of the SimTrackerHit class.

The tree entry itself for the object also looks okay …

root [21] t->Show(0)
======> EVENT:0
 Event           = (event::SimEvent*)0x404e110
 fUniqueID       = 0
 fBits           = 50331648
 eventNumber_    = 1000
 run_            = 1
 timestamp_      = 1473280918
 weight_         = 1.23
 recoilSimHits_  = 1
 recoilSimHits_.fUniqueID = 0
 recoilSimHits_.fBits = 50331648
 recoilSimHits_.id_ = 22222222
 recoilSimHits_.layerID_ = 0
 recoilSimHits_.edep_ = 2.345000
 recoilSimHits_.time_ = 42.000000
 recoilSimHits_.px_ = 1.000000
 recoilSimHits_.py_ = 2.000000
 recoilSimHits_.pz_ = 3.000000
 recoilSimHits_.x_ = 50.000000
 recoilSimHits_.y_ = 40.000000
 recoilSimHits_.z_ = 2000.000000
 recoilSimHits_.pathLength_ = 0.000000
 recoilSimHits_.simParticle_ = TRef

But the actual size taken by each data member of this single object within the event is enormous …

*............................................................................*                                                                               
*Br   41 :recoilSimHits_ : Int_t recoilSimHits__                             *                                                                               
*Entries :        1 : Total  Size=      12892 bytes  File Size  =         91 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   42 :recoilSimHits_.fUniqueID : UInt_t fUniqueID[recoilSimHits__]       *                                                                               
*Entries :        1 : Total  Size=        983 bytes  File Size  =        113 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   43 :recoilSimHits_.fBits : UInt_t fBits[recoilSimHits__]               *                                                                               
*Entries :        1 : Total  Size=        955 bytes  File Size  =        109 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   44 :recoilSimHits_.id_ : Int_t id_[recoilSimHits__]                    *                                                                               
*Entries :        1 : Total  Size=        941 bytes  File Size  =        107 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   45 :recoilSimHits_.layerID_ : Int_t layerID_[recoilSimHits__]          *                                                                               
*Entries :        1 : Total  Size=        976 bytes  File Size  =        112 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   46 :recoilSimHits_.edep_ : Float_t edep_[recoilSimHits__]              *                                                                               
*Entries :        1 : Total  Size=        955 bytes  File Size  =        109 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   47 :recoilSimHits_.time_ : Float_t time_[recoilSimHits__]              *                                                                               
*Entries :        1 : Total  Size=        955 bytes  File Size  =        109 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   48 :recoilSimHits_.px_ : Float_t px_[recoilSimHits__]                  *                                                                               
*Entries :        1 : Total  Size=        941 bytes  File Size  =        107 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   49 :recoilSimHits_.py_ : Float_t py_[recoilSimHits__]                  *                                                                               
*Entries :        1 : Total  Size=        941 bytes  File Size  =        107 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   50 :recoilSimHits_.pz_ : Float_t pz_[recoilSimHits__]                  *                                                                               
*Entries :        1 : Total  Size=        941 bytes  File Size  =        107 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   51 :recoilSimHits_.x_ : Float_t x_[recoilSimHits__]                    *                                                                               
*Entries :        1 : Total  Size=        934 bytes  File Size  =        106 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   52 :recoilSimHits_.y_ : Float_t y_[recoilSimHits__]                    *                                                                               
*Entries :        1 : Total  Size=        934 bytes  File Size  =        106 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   53 :recoilSimHits_.z_ : Float_t z_[recoilSimHits__]                    *                                                                               
*Entries :        1 : Total  Size=        934 bytes  File Size  =        106 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   54 :recoilSimHits_.pathLength_ : Float_t pathLength_[recoilSimHits__]  *                                                                               
*Entries :        1 : Total  Size=        997 bytes  File Size  =        115 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*                                                                               
*Br   55 :recoilSimHits_.simParticle_ : TRef simParticle_[recoilSimHits__]   *                                                                               
*Entries :        1 : Total  Size=       1020 bytes  File Size  =        124 *                                                                               
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *                                                                               
*............................................................................*   

For instance ROOT claims to be taking 955 bytes to store a single float value!

In my single event test, I use the typical pattern for writing the event out using a tree …

rootFile = new TFile("event_model_test.root", "RECREATE");
tree = new TTree("Event", "event tree");
SimEvent* event = new SimEvent();
tree->Branch("Event", "event::SimEvent", &event, 32000, 3);

SimTrackerHit* trackerHit = (SimTrackerHit*) event->addObject(event::RECOIL_SIM_HITS);
// set values on trackerHit 

tree->Fill();
rootFile->Write();
rootFile->Close();

I had suspected there was some issue in my code like perhaps objects not being cleared between events, but now I see that this is a problem even if I write an event only with one object in it!

Can anyone give me some hints as to how I would go about debugging this issue? Why would ROOT be taking up so much space to store primitive values like floats and ints?

Thanks.

–Jeremy

Just bumping up a bit because I posted on the weekend and I still can’t figure out how to get a reasonable file size using my event model …

8)

Hi,
The most likely cause is the initialization of the collection containing the event::SimTrackerHit (which does not seem to be shown in your code snippet).

You may have used fMyCollection->resize( 200 ); [code]rather than [code]fMyCollection->reserve( 200 );

If this is not the case, we would need a running example reproducing the problem.

Cheers,
Philippe.

Thanks for the reply.

Basically, I’m just doing this in the constructor of my event class:

recoilSimHits_(new TClonesArray(event::SIM_TRACKER_HIT, event::DEFAULT_COLLECTION_SIZE))

Where the default collection size I have defined is 100.

So I don’t call reserve or resize myself.

It seems to me that the problem still happens when I use a default size of 1 instead for these arrays.

If I still can’t figure this out after a bit of help, then I’ll make a small github project for reproducing the problem directly.

Thanks.

Hi,

The next suspect then is how the object are added in the TClonesArray. Either way, if you manager to get a reproducer this would be the easiest way forward.

Cheers,
Philippe.

Hi,

Since you use a TClonesArray, instead of:

SimParticle* particle1 = new SimParticle(); .... event->getCollection("SimParticles")->Add(particle1);try int nSimParticle = 0; SimParticle* particle1 = (SimParticle*)event->getCollection("SimParticles")->ConstructedAt(nSimParticle++); ...

Cheers,
Philippe.

The simplest test program I can come up with which shows this is the following …

#include "Event/SimEvent.h"
#include "Event/SimTrackerHit.h"

#include "TClonesArray.h"
#include "TFile.h"
#include "TTree.h"
#include "TBranch.h"

using event::SimEvent;
using event::SimTrackerHit;

int main(int, const char* argv[])  {

    TFile* rootFile = new TFile("ldmx_simple_event_test.root", "RECREATE");
    TTree *tree = new TTree("LDMX_Event", "LDMX event tree");
    SimEvent* event = new SimEvent();
    tree->Branch("Event", "event::SimEvent", &event, 32000, 3);

    TClonesArray* coll = event->getCollection(event::RECOIL_SIM_HITS);
    SimTrackerHit* trackerHit = (SimTrackerHit*) coll->ConstructedAt(0);
    trackerHit->setEdep(2.345);
    trackerHit->setPosition(50., 40., 2000.);
    trackerHit->setID(11111111L);
    trackerHit->setMomentum(1.0, 2.0, 3.0);
    trackerHit->setTime(42.);
    trackerHit->setPathLength(5.678);

    tree->Fill();
    rootFile->Write();
    rootFile->Close();

    return 0;
}

The output file ends up being 25K even though it has basically only one event, a single TClonesArray and one object.

If I use the constructor of TClonesArray that only takes the class name, there is no difference. It also does not seem to matter if I pre-allocate with a size of 1.

[quote=“pcanal”]Hi,

Since you use a TClonesArray, instead of:

SimParticle* particle1 = new SimParticle(); .... event->getCollection("SimParticles")->Add(particle1);try int nSimParticle = 0; SimParticle* particle1 = (SimParticle*)event->getCollection("SimParticles")->ConstructedAt(nSimParticle++); ...

Cheers,
Philippe.[/quote]

I don’t think that the Add method even works with a TClonesArray, so I am using ConstructedAt in my code.

Here is a clean github project that shows this problem:

github.com/JeremyMcCormick/ldmx-event-test.git

It only requires ROOT to compile so to build and run …

git clone https://github.com/JeremyMcCormick/ldmx-event-test.git
cd ldmx-event-test; mkdir build; cd build
cmake -DROOT_DIR=/path/to/root ..
./ldmx-simple-event-model-test

You can see the output file is 15K even though it has only one event and a single persisted object.

Thanks for the attention to this!

Hi,

I checked that the number you are seeing are normal/expected but needs a bit of explanation.

Since you are storing a single event with a single element in a collection, what you are actually seeing/measuring is the cost of the meta-data rather than the data.

For example, we have*Br 35 :recoilSimHits_.pz_ : Float_t pz_[recoilSimHits__] * *Entries : 1 : Total Size= 731 bytes File Size = 107 * *Baskets : 1 : Basket Size= 32000 bytes Compression= 1.00 * for one event and then for two events (still with only one entry in the collection)

*Br 35 :recoilSimHits_.pz_ : Float_t pz_[recoilSimHits__] * *Entries : 2 : Total Size= 739 bytes File Size = 115 * *Baskets : 1 : Basket Size= 32000 bytes Compression= 1.00 *
Where you can infer that the TBranch information takes (for that branch) 624 bytes, that the Basket meta-data takes 99 bytes and each entry in the baskets takes 8 bytes ( 4 for the float and for the record of where each entry starts in the basket (needed since it is variable size)).

The relative cost of the meta data will decrease with the number of entries and the number of element in the collections.

Cheers,
Philippe.

I accept there is overhead in saving the meta data information to the output file.

But it seems like the single event sizes are much too big.

Here is the branch information for just my cal hits …

*............................................................................*
*Br   56 :ecalSimHits_ : Int_t ecalSimHits__                                 *
*Entries :      100 : Total  Size=       8182 bytes  File Size  =        329 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.47     *
*............................................................................*
*Br   57 :ecalSimHits_.fUniqueID : UInt_t fUniqueID[ecalSimHits__]           *
*Entries :      100 : Total  Size=     358680 bytes  File Size  =       3573 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression= 100.19     *
*............................................................................*
*Br   58 :ecalSimHits_.fBits : UInt_t fBits[ecalSimHits__]                   *
*Entries :      100 : Total  Size=     358616 bytes  File Size  =       3779 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=  94.72     *
*............................................................................*
*Br   59 :ecalSimHits_.id_ : Int_t id_[ecalSimHits__]                        *
*Entries :      100 : Total  Size=     358584 bytes  File Size  =      38971 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   9.18     *
*............................................................................*
*Br   60 :ecalSimHits_.edep_ : Float_t edep_[ecalSimHits__]                  *
*Entries :      100 : Total  Size=     358616 bytes  File Size  =     328356 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   1.09     *
*............................................................................*
*Br   61 :ecalSimHits_.x_ : Float_t x_[ecalSimHits__]                        *
*Entries :      100 : Total  Size=     358568 bytes  File Size  =     332340 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   1.08     *
*............................................................................*
*Br   62 :ecalSimHits_.y_ : Float_t y_[ecalSimHits__]                        *
*Entries :      100 : Total  Size=     358568 bytes  File Size  =     327743 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   1.09     *
*............................................................................*
*Br   63 :ecalSimHits_.z_ : Float_t z_[ecalSimHits__]                        *
*Entries :      100 : Total  Size=     358568 bytes  File Size  =      37760 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   9.48     *
*............................................................................*
*Br   64 :ecalSimHits_.time_ : Float_t time_[ecalSimHits__]                  *
*Entries :      100 : Total  Size=     358616 bytes  File Size  =     287729 *
*Baskets :       12 : Basket Size=      32000 bytes  Compression=   1.24     *
*............................................................................*
*Br   65 :ecalSimHits_.simParticle_ : TRef simParticle_[ecalSimHits__]       *
*Entries :      100 : Total  Size=    1075078 bytes  File Size  =      16050 *
*Baskets :       41 : Basket Size=      32000 bytes  Compression=  66.90     *
*............................................................................*

This comes out to 17K per event so this seems like there is a lot of extra information being stored in each event. Is the meta data taking a lot of overhead here somehow for every event in the file?

How many items is there in each collection in each entries? For example look at the output of

Cheers,
Philippe.

Doing the math, it seems like maybe the events are bigger than I expect but not unreasonable based on the size of the objects. One of my cal hits takes 41 bytes, and I get about 890 hits per event. So that’s about 35k per event just in cal hits (uncompressed).

So I guess I need to look at other ways to make these events smaller, like reducing the number of calorimeter hits. :open_mouth:

Thanks for looking.

[quote=“pcanal”]How many items is there in each collection in each entries? For example look at the output of

Cheers,
Philippe.[/quote]

Here is a sample from scanning the entry sizes …

************************
*    Row   * ecalSimHi *
************************
*        0 *       854 *
*        1 *       835 *
*        2 *       972 *
*        3 *       953 *
*        4 *       963 *
*        5 *       916 *
*        6 *      1031 *
*        7 *       948 *
*        8 *      1064 *
*        9 *       917 *
*       10 *       964 *
*       11 *       751 *
*       12 *       773 *
*       13 *       941 *
*       14 *       874 *
*       15 *       928 *
*       16 *       695 *
*       17 *       951 *
*       18 *       884 *
*       19 *       650 *
*       20 *       762 *
*       21 *      1028 *
*       22 *       924 *
*       23 *      1029 *
*       24 *       772 *

It is about 890 cal hits on average per event from my rough calculation.

I’m thinking though that something still doesn’t add up here. If I’m using 41 bytes per cal hit with 890 hits per event, that should be about 36 kb per event. With 50k events that would be about 1.8 gig but I see 13 gig so it is ~7 times bigger than I would expect. (This is only an approximation but the data size is dominated by the number of cal hits.)

To be sure, you can maybe loop over the 50k events and calculate the total sum of cal-hits in the whole file. Maybe there are some events with many more cal-hits than the expected mean.

One of my cal hits takes 41 bytes, How did you derive this number? From the TTree::Print i see:[code]*Br 56 :ecalSimHits_ : Int_t ecalSimHits__ *
*Br 57 :ecalSimHits_.fUniqueID : UInt_t fUniqueID[ecalSimHits__] *
*Br 58 :ecalSimHits_.fBits : UInt_t fBits[ecalSimHits__] *
*Br 59 :ecalSimHits_.id_ : Int_t id_[ecalSimHits__] *
*Br 60 :ecalSimHits_.edep_ : Float_t edep_[ecalSimHits__] *
*Br 61 :ecalSimHits_.x_ : Float_t x_[ecalSimHits__] *
*Br 62 :ecalSimHits_.y_ : Float_t y_[ecalSimHits__] *
*Br 63 :ecalSimHits_.z_ : Float_t z_[ecalSimHits__] *
*Br 64 :ecalSimHits_.time_ : Float_t time_[ecalSimHits__] *
*Br 65 :ecalSimHits_.simParticle_ : TRef simParticle_[ecalSimHits__] *

[code]With 5 floats and 4 Ints and 1 TRef Uncompressed this is 54+48 = 52 bytes per element plus the TRef which seems to take 1200 bytes per elements! … so the TRef dominates the output by far.

Rather than one TRef per elements you might want to consider using a TRefArray containing one entry per elements (simHits).

Alternatively you might be able to replace the TRef by a direct index value( i.e. an int or long) in the collection holding the referencees.

Cheers,
Philippe.