I am trying to sort a Tree and write to a new root file. but my code crashes.
[code]in = TFile::Open(“unsorted.root”);
TTree SinglesTree;
SinglesTree = (TTree)gDirectory->Get(“Singles”);
SinglesTree->Draw(“time”,"",“goff”);
nentries = SinglesTree->GetEntries();
long long *index = new long long[nentries];
TMath::Sort(nentries,SinglesTree->GetV1(),index,0);
TTree *SinglesTree_Out;
SinglesTree_Out = SinglesTree->CloneTree(0);
for (IndexNo=0;IndexNo<nentries;IndexNo++){
SinglesTree->GetEntry(index[IndexNo]);
SinglesTree_Out->Fill();
}
delete in;
out = TFile::Open(“sortedroot”,“recreate”);
SinglesTree_Out->Write();
delete out;[/code]
When I compile this, it seems ok and there are no compilation errors, but when running it I get the following error:
Error in <TTree::Fill>: Failed filling branch:Singles.comptVolName, nbytes=-1
This error is symptomatic of a Tree created as a memory-resident Tree
Instead of doing:
TTree *T = new TTree(...)
TFile *f = new TFile(...)
you should do:
TFile *f = new TFile(...)
TTree *T = new TTree(...)
Error in <TTree::Fill>: Failed filling branch:Singles.RayleighVolName, nbytes=-1
This error is symptomatic of a Tree created as a memory-resident Tree
Instead of doing:
TTree *T = new TTree(...)
TFile *f = new TFile(...)
you should do:
TFile *f = new TFile(...)
TTree *T = new TTree(...)
...
...
I have a question on the same topic though. I tried the working code on a small root file (8000 entries) and it successfully sorted it though. Then I tried it on a larger root file (5001281 entries) and it just hangs.
Is there something I need to be aware of when running this code on a large data set?
sorting 5M elements takes a lot longer than sorting 9k elements. TMath::Sort() should take NlogN, i.e. it should take about 1000 times longer. If that isn’t it we’ll need the actual code to be able to reproduce it. Or you attach gdb to root.exe (yes, “.exe” even on non-Windows) yourself and see where it’s stuck (“bt” will show the current backtrace).
I am a little perplexed by my root sorting code. One one computer it keeps running and the same code on another computer crashes. The root version is 5.26 on both of them.
nentries = SinglesTree->GetEntries();
SinglesTree->Draw("time","","goff");
.....
Int_t *index = new Int_t[nentries];
TMath::Sort(nentries,SinglesTree->GetV1(),index,down);The default is for TTree::Draw to keep accessible via GetV1 only 1000000 values. If nentries is greater than this value, you are request TMath::Sort to read beyond the end of the array. To change the default value use: nentries = SinglesTree->GetEntries();
SinglesTree->SetEstimate(nentries);
SinglesTree->Draw("time","","goff");
.....
Int_t *index = new Int_t[nentries];
TMath::Sort(nentries,SinglesTree->GetV1(),index,down);.
Where it prints “V1 Created”, but not “Sorted”, in VS Code the reason it says the reason the command failed was “Exit Code 9” I’m not sure why that is but I think thats OOM. Is there any way around this? For context the root file being sorted is 7.1 GB with 3 Branches with 795,806,682 entries. Testing with the BuildIndex method also leads to the same result.
thank you! It was a simulation of a million high energy decays in geant4. Sorting was to make coincidence finding algorithm quicker. Turns out for this case its probably not necessary to sort at that scale.
Just out of curiosity, would there be a simple way to perform that sort without needing as much memory or is that a hard limit?
However the general ideas is straight-forward. Go through the file once (possible in parallel on different node) to load and sort chunks that fits in memory. Write the sorted chunks into K new (trees into new) files (K is approximately data size divided by usable memory size). Then open all K files at the same times and reading small parts at time from each, do a K-way sorting merge into the final file.
Sorting was to make coincidence finding algorithm quicker.
In this case, due to the cost of reading the data from disk at least twice, it might indeed not actually help
When simulating an in-beam experiment including beta-decay of reaction products and this over a “beam time” over a week with about 10^4 particles per second using geant4, with as objective to evaluate background so doing coincidences I solved the problem by using many output files, each files storing events happening at time (t_i,ti+1]. I recall having used some 20000 files or so but that really depends on the case. And then one sort each file and merge.