Problem sorting a tree

Hi,

I’m trying to sort a tree according to a number stored in the tree. This has been discussed here before and I have adopted the script to my needs. However, something is seriously wrong, since my sorted tree is not quite sorted after the operation. Once every few hundred entries there is one entry that is out of order as illustrated in the output of TTree::Scan

*     2996 *     32950 *
*     2997 *     32950 *
*     2998 *     38461 *
*     2999 *     32950 *
*     3000 *     32950 *

I really have no idea what I could be doing wrong to cause a behavior like this and I would appreciate any help.

If you want to reproduce the problem and look at the code I use, please save the attached code files and a sample tree to a directory and just execute the following command in root:

root [0] .L TreeClass2.C+
root [1] .L treesort2.C+
root [2] treesort2()
Reading Run numbers
 ------ PROCESSING ev NO. 499000 of 500000 ------
Sorting Run numbers
processing branch: runNumber
processing branch: gamma_ndigis
processing branch: gamma_digiList_phi
processing branch: gamma_digiList_theta
******************************************************************************
*Tree    :ntp1      : Isr Gammas                                             *
*Entries :   500000 : Total =       109669905 bytes  File  Size =   24727439 *
*        :          : Tree compression factor =   4.44                       *
******************************************************************************
*Br    0 :runNumber : runNumber/I                                            *
*Entries :   500000 : Total  Size=    2025670 bytes  File Size  =      51838 *
*Baskets :      252 : Basket Size=       8000 bytes  Compression=  38.87     *
*............................................................................*
*Br    1 :gamma_ndigis : gamma_ndigis/I                                      *
*Entries :   500000 : Total  Size=    2026444 bytes  File Size  =     565662 *
*Baskets :      252 : Basket Size=       8000 bytes  Compression=   3.56     *
*............................................................................*
*Br    2 :gamma_digiList_phi : gamma_digiList_phi[gamma_ndigis]/I            *
*Entries :   500000 : Total  Size=   52819033 bytes  File Size  =   13292716 *
*Baskets :     6849 : Basket Size=       8000 bytes  Compression=   3.96     *
*............................................................................*
*Br    3 :gamma_digiList_theta : gamma_digiList_theta[gamma_ndigis]/I        *
*Entries :   500000 : Total  Size=   52832743 bytes  File Size  =   10712073 *
*Baskets :     6849 : Basket Size=       8000 bytes  Compression=   4.91     *
*............................................................................*
root [3]

Please let me know if more info is needed to track down this issue.
Thanks in advance for your help.

Cheers,

Andreas
treesort2.C (2.27 KB)
TreeClass2.h (3.98 KB)
TreeClass2.C (1.43 KB)

Andreas,

It looks like you have a bug in treesort2. Replace

looper.GetEntry(index+1); by

looper.GetEntry(index);

Rene

Hi Rene,

thanks for catching this off-by-one error. Stupid me not catching it myself :confused:

I’d like to ask another question. If I want to sort a large tree (>5GB in one file), the script I use seems to keep everything in memory. Eventually the job grows to 9GB (not a problem on a 64bit machine but not very efficient either). Is there an easy way to reduce the memory footprint of my script?

Cheers,

Andreas

Andreas,

Sorting very large arrays that do not fit in memory is an interesting challenge.
If I had a working solution, it would be in ROOT ::slight_smile:
I already tried the combination of our “quicksort” algorithm with another algorithm “mergesort”, but I did not achieve the expected performance.
You can see the discussion at: en.wikipedia.org/wiki/Sorting_algorithm
and look at section “Memory usage patterns and index sorting”

If somebody comes with an interesting algorithm, let me know.

Rene

Hi Rene,

while both my unsorted and the sorted tree together don’t fit into memory at the same time, each branch should easily fit into memory. Is there a way to load a branch, sort it, and push it out to disk before going to the next branch?

What my script does now is:

  //loop branch by branch on the input Tree and fill the output Tree
  TIter next(Tin->GetListOfBranches());
  TBranch *bin, *bout;
  while ((bin = (TBranch*)next())) {
    bout = Tout->GetBranch(bin->GetName());
    printf("processing branch: %s\n",bin->GetName());
    //load all baskets of this branch in memory (for performance)
    bin->LoadBaskets();
    //loop on entries and fill output Tree
    for (index=0;index<N;index++) {
      bin->GetEntry(sorted[index]);
      bout->Fill();
    }
    bin->DropBaskets();
    Tout->AutoSave();
  }

but neither TBranch::DropBaskets() nor TTree::AutoSave() nor TTree:SetMaxVirtualSize() seem to have any effect on memory usage.

Cheers,

Andreas

Andreas,

Thanks to your example, I have found the problem and fixed it in CVS head.
When calling TBranch::LoadBaskets, a class member was not incremented.
As a result, TBranch::DropBuffers was not releasing the baskets buffers.

Rene

Hi Rene,

thanks for the fix. Indeed, where my job was using >8GB, now it is down to less than 300MB. :smiley:

Cheers,

Andreas

Hi again,

hmm, something is still not quite right with my sorted tree. :open_mouth:

The code that I use to sort my original tree (see attached script) is messing up the entries in the sorted tree. As you can see in the attached plots are clearly not what they should be. Something most be wrong with the handling of arrays when copying the branches, or the original tree is messed up in some hidden way that manifests itself only when copying the branches.

Again I’d appreciate any help in the matter.

Cheers,

Andreas
treesort2.C (2.27 KB)




Hi again,

I seem to be stuck with this problem. I can’t trust the sorted tree if the distributions are messed up afterward. :frowning:

Is there anyone out there who can reproduce the problem?

Cheers,

Andreas

Andreas,

It took me some time to understand this problem. In my original posting
of the example showing how to sort a Tree, I forgot that branches may have a branch counter, in your case the branch “gamma_ndigis” contains the number of entries in two other branches: “gamma_digiList_phi”, “gamma_digiList_theta”. When reading one of these two branches, the counter
was not set correctly.

It would be possible to modify my example script (your treesort2.C).
However, no changes are required if you update to CVS head where I do this automatically in TBranch::GetEntry.

Rene