Options for speeding up Tree Read

megooden · December 18, 2018, 10:55pm

Hi All,

I have data from 16 channels of detectors that I have been saving as purely binary files. The data in each file/channel is time sorted as it is acquired. We have been wanting to move to a ROOT Tree/File type format believing to have all the data contained in a nice format that is easily readable but the data processing is taking several orders of magnitude longer than simply reading from the binary files directly. I am curious if there is (certainly there is and I have not been smart enough to figure out the documentation) a better way to process the tree that I have created.

I have made a tree with 3 branches (Channel, Energy, Timestamp) called DAQTree. The goal would eventually be to have the daq save the data directly to a tree but for now I have created it from existing data to test the analysis. There are 16 channels (0-15) and while the data for each individual channel is time ordered, all 16 channels are needed to be time ordered to process coincidences using the Timestamp branch. This is the test code that I have at the moment for a tree that contains 100 million entries (700 MB):

    uint16_t Energy;
    uint64_t Timestamp;
    int CH;

    sprintf(buf, "%s/%s.root", MAIN_FILE_PATH, ifile);
    TFile *FF = TFile::Open(buf, "UPDATE");
    TTree *T1 = (TTree *) FF->Get("DAQTree");
    T1->SetBranchAddress("Energy", &Energy);
    T1->SetBranchAddress("Channel", &CH);
    T1->SetBranchAddress("Timestamp", &Timestamp);

//Sort data along Timestamp branch
    auto start2 = std::chrono::high_resolution_clock::now(); // try to time how long sorting takes
    T1->BuildIndex("Timestamp", "0");
    TTreeIndex *I = (TTreeIndex *) T1->GetTreeIndex();
    Long64_t *index = I->GetIndex();
    auto finish2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed2 = finish2 - start2;
    cout << "Elapsed time for sorting: " << elapsed2.count() << " s\n";

    TH1F *h = new TH1F("h", "h", 16384, 0, 16384);
    TH1F *h1 = new TH1F("h1", "h1", 16384, 0, 16384);
    uint64_t orig_T, diff_T;
    auto entries = T1->GetEntries();
    T1->GetEntry(index[0]);
    orig_T = Timestamp;
    for (auto i = 1; i <= entries; i++) {
        T1->GetEntry(index[i]);
        //diff_T = Timestamp - orig_T;
        if (CH < 8) {
            //h1->Fill(Energy); //Singles Spectrum
            //            if (diff_T <= 1000) {
            //                h->Fill(Energy); //Coincidence Singles
            //            } else {
            //                orig_T = Timestamp;
            //            }
        }
    }

    h->Write();
    h1->Write();
    
    auto finish3 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed3 = finish3 - start2;
    cout << "Elapsed time: " << elapsed3.count() << " s\n";

The first elapsed time for the sorting takes ~36 seconds, which while slower than the sorting I already do on the binary files (using non-ROOT c++) is reasonable. The for-loop over the entries using the new index for the time ordering of events I have not established how long it takes because I have killed the process each time after about 30 minutes.

My other code that simply reads all 16 channels in as binary using standard c++ IO, is able to do the read, sort and fill ~2 dozen TH1F histograms in about 2 minutes. I was hoping for a more fully root centric way to do this analysis making use of the nice Tree structure, I assumed the HEP community had much larger data sets than I do and so this type of looping over a tree would be quite quick or at least better method existed.

If there is something I could do better, have done wrong or should stick to what I have, I would very much appreciate help.

Thank you,

Matthew

_ROOT Version:6.14.06
_Platform:Mac OSX Mojave
_Compiler: gcc (GCC) 7.1.0

bellenot · December 19, 2018, 8:03am

Maybe using RDatFrame could help. @Danilo and @eguiraud may give more details

pcanal · December 19, 2018, 6:34pm

TTree are optimized for read in sequential order of entries. The entries are bunched in baskets (which are bunched in clusters) and each basket is compressed and stored individually. When reading out of sequence, for each entry read (in first approximation) you end up reading and decompressing a basket (that spans several entries) and then discard it. This result in reading and decompressing the data many times (in first approximation the cardinality of a basket).

If your TTree fits in memory you can significantly improve performance by using:

T1->SetMaxVirtualSize( some_amount_of_memory_larger_than_the_TTree );

This will make sure that once loaded and decompressed the baskets are kept in memory. You can even preload them via

T1->LoadBaskets( some_amount_of_memory_larger_than_the_TTree );

or given than

while the data for each individual channel is time ordered,

you could put each individual channel in a different TTree (or at least in a separate branch) and then you would be able to read the entries of each TTree (or branches) in order.

megooden · December 19, 2018, 6:58pm

Thank you for the suggestion. Since the file is 700 MB I can fit it entirely in memory and process it. With the previous suggestion of RDataFrame, I attempted earlier to change the cache size. Does this have a similar effect? I changed it to 1MB from the default 32kB but could make it file size. Would changing the basket size as well in the creation of the tree make any difference?

I will look into/consider the multiple trees but having a single tree would be the most preferable route.

pcanal · December 19, 2018, 7:21pm

The TTreeCache stores the uncompressed basket and thus increasing its size would not really help you.

Would changing the basket size as well in the creation of the tree make any difference?

It would make a difference but not enough to make a significant difference in your case (unless you push it to a point where it starts harming performance in other ways (for example by having too many baskets)

Wile_E_Coyote · December 19, 2018, 7:29pm

Note that ROOT files / trees are usually compressed. So, you should not look at the “file size”, which will usually be smaller than the required “some_amount_of_memory_larger_than_the_TTree”.
Execute DAQTree->Print(); and see the “Total =” value in the third printed line, which will give you the real uncompressed size of your tree (and compare to the “File Size =” and the “Tree compression factor =” values therein).

megooden · December 19, 2018, 7:40pm

Wile_E: Thanks for the comment, it is deceptive to look at the .root file size. Luckily I know what the original data was total in size (~1 GB total in 16 binary files) and I think the compression factor was 2 so the full tree is something like ~1.4 GB.

I tried the SetMaxVirtualSize and used 1GB (before I thought about the full data size uncompressed) and it sped things up considerably just looping through the tree based on my ordered index. Once I tried to fill a histogram things so down to a crawl and I killed the process. T1->LoadBaskets(), seems to have done the trick though, looping through the new index and filling a single histogram is down to 35 seconds from some 30+ min length it was before.

system · January 2, 2019, 7:40pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.