Sorting in a root file

karthikayan · September 20, 2010, 10:17pm

Hello All,

I am trying to sort a Tree and write to a new root file. but my code crashes.

[code]in = TFile::Open(“unsorted.root”);
TTree SinglesTree;
SinglesTree = (TTree)gDirectory->Get(“Singles”);
SinglesTree->Draw(“time”,"",“goff”);
nentries = SinglesTree->GetEntries();
long long *index = new long long[nentries];
TMath::Sort(nentries,SinglesTree->GetV1(),index,0);
TTree *SinglesTree_Out;
SinglesTree_Out = SinglesTree->CloneTree(0);
for (IndexNo=0;IndexNo<nentries;IndexNo++){
SinglesTree->GetEntry(index[IndexNo]);
SinglesTree_Out->Fill();
}
delete in;

out = TFile::Open(“sortedroot”,“recreate”);
SinglesTree_Out->Write();
delete out;[/code]

When I compile this, it seems ok and there are no compilation errors, but when running it I get the following error:

Error in <TTree::Fill>: Failed filling branch:Singles.comptVolName, nbytes=-1 This error is symptomatic of a Tree created as a memory-resident Tree Instead of doing: TTree *T = new TTree(...) TFile *f = new TFile(...) you should do: TFile *f = new TFile(...) TTree *T = new TTree(...) Error in <TTree::Fill>: Failed filling branch:Singles.RayleighVolName, nbytes=-1 This error is symptomatic of a Tree created as a memory-resident Tree Instead of doing: TTree *T = new TTree(...) TFile *f = new TFile(...) you should do: TFile *f = new TFile(...) TTree *T = new TTree(...) ... ...

Any idea what I am doing wrong? Many thanks.

Karthik.

Axel · September 21, 2010, 10:28am

Hi,

as it says, you should put SinglesTree_Out into a new file by opening sortedroot before calling SinglesTree_Out = SinglesTree->CloneTree(0);

Cheers, Axel.

karthikayan · September 21, 2010, 4:39pm

Thanks Axel. I should have guessed the solution.

I have a question on the same topic though. I tried the working code on a small root file (8000 entries) and it successfully sorted it though. Then I tried it on a larger root file (5001281 entries) and it just hangs.

Is there something I need to be aware of when running this code on a large data set?

Thanks.

Axel · September 22, 2010, 7:16am

Hi,

sorting 5M elements takes a lot longer than sorting 9k elements. TMath::Sort() should take NlogN, i.e. it should take about 1000 times longer. If that isn’t it we’ll need the actual code to be able to reproduce it. Or you attach gdb to root.exe (yes, “.exe” even on non-Windows) yourself and see where it’s stuck (“bt” will show the current backtrace).

Cheers, Axel.

karthikayan · September 22, 2010, 2:56pm

Hello Axel,

I am a little perplexed by my root sorting code. One one computer it keeps running and the same code on another computer crashes. The root version is 5.26 on both of them.

Here is the code:

TFile           *in = TFile::Open("Unsorted.root");
TTree           *SinglesTree;
SinglesTree     = (TTree*)gDirectory->Get("Singles");
nentries = SinglesTree->GetEntries();
SinglesTree->Draw("time","","goff");
SinglesTree->SetBranchAddress("time",&time);
printf("Starting sorting ... \n");
Int_t	*index = new Int_t[nentries];
TMath::Sort(nentries,SinglesTree->GetV1(),index,down);
printf("Sorting done...\n");
...

When executed:

[quote]Starting sorting …

*** Break *** segmentation violation
Segmentation fault[/quote]

Thanks for you help.
Karthik.

pcanal · September 22, 2010, 4:10pm

Hi,

nentries = SinglesTree->GetEntries(); SinglesTree->Draw("time","","goff"); ..... Int_t *index = new Int_t[nentries]; TMath::Sort(nentries,SinglesTree->GetV1(),index,down);The default is for TTree::Draw to keep accessible via GetV1 only 1000000 values. If nentries is greater than this value, you are request TMath::Sort to read beyond the end of the array. To change the default value use:nentries = SinglesTree->GetEntries(); SinglesTree->SetEstimate(nentries); SinglesTree->Draw("time","","goff"); ..... Int_t *index = new Int_t[nentries]; TMath::Sort(nentries,SinglesTree->GetV1(),index,down);.

Cheers,
Philippe.

JacobJaffe · January 7, 2025, 9:54am

Hi, I am using a similar sorting code and am running into the issue where the run simply fails with no direct error message in the terminal.

Long64_t nevt = StepData->GetEntries();
...
Long64_t *index = new Long64_t[nevt];
printf("Index Initialized\n");
StepData->SetEstimate(nevt); 
StepData->Draw("time","","goff");
printf("V1 Created\n");
TMath::Sort(nevt,StepData->GetV1(),index,false);
printf("Sorted\n");

Where it prints “V1 Created”, but not “Sorted”, in VS Code the reason it says the reason the command failed was “Exit Code 9” I’m not sure why that is but I think thats OOM. Is there any way around this? For context the root file being sorted is 7.1 GB with 3 Branches with 795,806,682 entries. Testing with the BuildIndex method also leads to the same result.

pcanal · January 7, 2025, 9:22pm

This requires at least 12GB of memory (and likely more). The process is likely running out of memory.

What is the purpose of sorting such a large amount of data?

JacobJaffe · January 7, 2025, 10:09pm

thank you! It was a simulation of a million high energy decays in geant4. Sorting was to make coincidence finding algorithm quicker. Turns out for this case its probably not necessary to sort at that scale.

Just out of curiosity, would there be a simple way to perform that sort without needing as much memory or is that a hard limit?

pcanal · January 7, 2025, 10:44pm

There is a whole area of research on the subjects (See "External sorting or even “parallel sorting”).

However the general ideas is straight-forward. Go through the file once (possible in parallel on different node) to load and sort chunks that fits in memory. Write the sorted chunks into K new (trees into new) files (K is approximately data size divided by usable memory size). Then open all K files at the same times and reading small parts at time from each, do a K-way sorting merge into the final file.

Sorting was to make coincidence finding algorithm quicker.

In this case, due to the cost of reading the data from disk at least twice, it might indeed not actually help

joa · January 8, 2025, 8:34am

Hi,

When simulating an in-beam experiment including beta-decay of reaction products and this over a “beam time” over a week with about 10^4 particles per second using geant4, with as objective to evaluate background so doing coincidences I solved the problem by using many output files, each files storing events happening at time (t_i,ti+1]. I recall having used some 20000 files or so but that really depends on the case. And then one sort each file and merge.

cheers

Joa