TChain->GetEntry() Slow

Dear Root-Users,
I am currently working with GATE for a Project and therefore with ROOT 6.14/06. My OS is Ubuntu 16.04LTS.
I have a problem using the TChain and I think I probably use it wrong and was hoping that maybe some of you guys could shed some light on that.

I want to read-in multiple files, do some calculations with some the data (keep the rest) and than write it into a new .root-file. And it works fine for one file. But when I use two files the TChain->GetEntry() function gets really slow.

I thinned out the code a little bit because I know where the Problem lies. I compile with:
g++ -std=c++1y Name.cpp NameofClass.cpp -o NameOut root-config --cflags --glibs -Wall
It compiles fine, no warnings or anything.

// Type aliases to make accessing nested type easier
using clock_t = std::chrono::high_resolution_clock;
using second_t = std::chrono::duration<double, std::ratio<1> >;

std::chrono::time_point<clock_t> m_beg;
std::chrono::time_point<clock_t> m_temptime;

// I want to get the Events of the “Singles” TTree in all the files
TChain* InputTree = new TChain(“Singles”);

// Adding multiple files to TChain, m_infilenames is a vector
for(unsigned int f=0; f<m_infilenames.size(); f++)
{
InputTree->Add(m_infilenames[f].c_str());
InputTree->LoadTree(f);

    if (InputTree->LoadTree(f) < 0)
    {
        cout << "Could not read file: " << m_infilenames[f]
             << "\nAbort" <<  endl;
        exit(1);
    }
}
// Open Outputfile
TFile *newfile = new TFile(m_outfile.c_str(), "recreate");

// // check if both TFiles are open
if (!newfile->IsOpen())
{
    cout << "Could not read file " << m_outfile << "\n";
    exit(1);
}

InputTree->SetBranchAddress(“time”, &m_Stime);
InputTree->SetBranchAddress(“layerName”, &m_Slayername);

TObjArray *fileElements= InputTree->GetListOfFiles();
// TIter next(fileElements);
TChainElement *chEl=0;
// TObjArray *mylist = (TObjArray*)InputTree->GetListOfBranches();


// Set some variables needed for the Loop
m_TotalEntries = InputTree->GetEntries();  // All entries
int NumOfFiles = fileElements->GetEntries();  // Get Number of file in TChain
Long64_t DynIdxBegin = 0;  // Index where new File begins
Long64_t DynIdxEnd = 0;  // Index where new File ends
Long64_t EntriesCurrentFile = 0; // contains entries of current file

// Resize all vectors according to files
m_TimeContainer.resize(m_TotalEntries);
m_TimeSorted.resize(m_TotalEntries);
m_LayerChar.resize(m_TotalEntries);

m_beg = clock_t::now();

// loop over files via chEL
for (int FileIndex=0; FileIndex<NumOfFiles; FileIndex++)
{
    // make chEL point to current file
    chEl = (TChainElement*)fileElements->At(FileIndex);

    // Get the amount of entries of current File
    EntriesCurrentFile = chEl->GetEntries();
    DynIdxBegin = DynIdxEnd;
    DynIdxEnd = DynIdxBegin + EntriesCurrentFile;

    for(Long64_t i=DynIdxBegin; i<DynIdxEnd; i++)
    {

        InputTree->GetEntry(i);
       [...] Do some calculations and file the 
    }

m_beg = clock_t::now();
vector<Long64_t> idx = sort_indexes(m_TimeContainer);

for (Long64_t i=0; i<m_TotalEntries; i++)
{
   [...] Fill the m_TimeSorted vector
}

m_beg = clock_t::now();

for(Long64_t i=0; i<NewIndex; i++)
{
    m_temptime = clock_t::now();

    // HERE IS THE PROBLEM
    InputTree->GetEntry(m_TimeSorted[i].Index);
    cout << "i: " << i << "\tTemp: " << std::chrono::duration_cast<second_t>(clock_t::now() - m_temptime).count() << endl;

    m_CoincID = m_TimeSorted[i].Id;

    // Save Time in new root file in seconds
    m_Stime = m_TimeSorted[i].Time;

    m_NewCoincidences->Fill();

    cout << "i: " << i << "\tTime: " << std::chrono::duration_cast<second_t>(clock_t::now() -m_beg).count() << endl;
}


m_beg = clock_t::now();

m_NewCoincidences->Write();
newfile->Close();

The output if I use two files is:

i: 0 Temp: 1.4478942770
i: 0 Time: 1.4481196320
i: 1 Temp: 2.9032056620
i: 1 Time: 4.3513756090
i: 2 Temp: 2.9014497060
i: 2 Time: 7.2528713680
i: 3 Temp: 0.0002660290
i: 3 Time: 7.2531582510
i: 4 Temp: 0.0000029560
i: 4 Time: 7.2531717200
[…]

When I only use one file, the programm takes about 0.16 Seconds. When I use two files (I use the same file twice but renamed it) it takes around 7 Seconds until it reaches the value of 2. The “Temp” values displays how long it takes for the InputTree()->GetEntry(i) statement.

What am I doing wrong?

Greetings
Andi

// HERE IS THE PROBLEM
InputTree->GetEntry(m_TimeSorted[i].Index);

Indeed, reading a TTree (and even worse) a chain out of order can be very expansive. A TChain has only one open file at a time, if the code request to read entries that are not in the current TTree then the following happens:

  • Current TTree object is deleted
  • Current file is close
  • Another file is open
  • Another TTree is read (somewhat slow operation).
    In addition when ever you ask for reading entries (even within a TTree) out of order the following might happen.
  • Current set of basket of data is discarded
  • Another set of basket is read in memory
  • This new set of basket is decompressed (somewhat slow operation).

In addition there is a couple of misunderstanding in the code:

for(unsigned int f=0; f<m_infilenames.size(); f++)
{
   InputTree->Add(m_infilenames[f].c_str());
   InputTree->LoadTree(f);

Actually the argument to LoadTree’s valid range is from 0 to the total of number of entries in the cumulation of TTrees in the TChain. What is does is to make sure that the TTree for the given entry is loaded in memory (discarding any previous TTree).

m_TotalEntries = InputTree->GetEntries();  // All entries

Note that this is a relatively slow operation, in order to get this number the TChain needs to open each file, read the TTree (and later discard it) to acquire the number of entries in that TTree. The resulting information is then cached (so the cost is only ‘one-time’).

// loop over files via chEL
....

Most of the calculation in that block are (also) done in TChain … i.e. the usual way of writing this is:

for(Long64_t i=0; i< inputTree->GetEntriesFast(); i++)  // GetEntriesFast is a very large number until the last file is read (for the first time) at which point it returns the real total number of entries in the chain
{
     Long64_t entryNumberWithinCurrentTree = inputTree->LoadTree(i); // The result can be used with TBranch::GetEntry
     if (entryNumberWithinCurrentTree < 0) { 
         // something went wrong.
         continue or break;
     }
     inputTree->GetEntry(i); // Use this only if you need all the branches data, otherwise use TBranch::GetEntry.
    [...] Do some calculations and file the 
}

Cheers,
Philippe.

1 Like

Dear Philippe,
thank you very much for your quick and thorough reply. I decided to discard the approach with using the TChain-Class. Instead I decided to first read all data and store it into an std::vector, do my post processing and then open a new .root-file + TTree and write the data from the std::vector to that file. This approach is quite similar the root tutorial:
https://root.cern.ch/root/html/tutorials/tree/hvector.C.html

So far, this approach works but I have not tested it on very large files yet. I pondered about different approaches on how to be more efficient like processing a small amount of data, write it and then repeat those steps till I have processed all the data. This gave rise to a simple question which could solve the problem that was the reason for using TChain in the first place.

Can I write data at the and of an existing Ttree which is already written to a TFile? And if so, how would I do that?

Best regards,
Andi

You may encounter problems where the vector does not fit in memory.

Can I write data at the and of an existing Ttree which is already written to a TFile? And if so, how would I do that?

That depends whether you want to add a new branch or new entries.

Yes this was my thought as well and the reason why I tried to avoid it.

I would like to add new entries.

To add new entry,

Simply open the file in update mode

TFile::Open(filename, "UPDATE");

grab the TTree as normal, then call SetBranchAddress for all the top level branches.

and then call Fill as usual.

Cheers,

Philippe

1 Like