_ROOT Version: 6.26
_Platform: Windows 10
_Compiler: Visual Studio
I’m trying to loop through the entries in a TTree from a .root file, but I’m getting bad_alloc errors. The files are 80 - 300 MB in size. I believe I’m running out of memory, as I don’t get the error on smaller files (~ 1 MB).
I have tried both python and C++, and get the same behaviour. Is there a way of looping through the entries in a TTree that doesn’t load the whole thing into memory?
Alternatively, what I’m trying to do is take a root file containing data from several hours of an experiment and split it into several root files, based on the timestamp of each event, to check for changes in the detector response over time. So for example, get a separate root file for each hour of data. Is there a way to do this that doesn’t involve looping through the whole TTree?
and adding delete tf at the end of the loop, but I’m still getting the same bad_alloc error during the loop. The error happens after around 200,000 iterations, but not always at exactly the same one.
My code that produces the error now reads
void testSnippet() {
std::string filename = "C:\\run277_lf.root";
TFile* tf = TFile::Open(filename.c_str(), "READ");
TTree* tree = dynamic_cast<TTree*>(tf->Get("Board 0"));
tree->Print();
ULong64_t timeStamp;
tree->SetBranchAddress("timeStamp", &timeStamp);
float ts = 0;
int nEntries = tree->GetEntriesFast();
for (int i = 0; i < nEntries; i++) {
if (tree->GetEntry(i) <= 0) {
break;
}
if (timeStamp > ts) {
ts = timeStamp;
}
}
delete tf;
std::cout << ts << std::endl;
}
That’s a good point about the type for nEntries and i, I’m just so used to typing int for these things.
Is there a particular reason to use “\n” over std::endl? The code hits the bad_alloc error before completing the for loop, so it makes no difference on this occasion, but I generally prefer to use the standard library if I can.
Another thing to try … work in a directory whose name does not contain any “special” characters (e.g., “O16_p_y” instead of “16O(p,y)”, and maybe “O16Analysis” instead of “16OAnalysis”).
I have tried both python and C++, and get the same behaviour. Is there a way of looping through the entries in a TTree that doesn’t load the whole thing into memory?
As Wile mentioned, ROOT already does not load the data chunk-a-time. However even if it was loading all the data at once (300 MB), you should not be running out of memory (assuming your machine has several GB of RAM).
So something else is going. A straight-forward solution would be (if you can) to run your failing example on Linux and use the tool valgrind to pin point the problem; Alternatively, you can try to cut portion of your code until it stops failing and that might give you an indication of the issues. Another alternative is to build your code in debug mode and use the debugger to find out where it fails.
Something along these lines with RDataFrame should help
import ROOT
df = ROOT.RDataFrame("TTreeNameHere", "FileNameHere")
hours = [(datetime.datetime.now() + datetime.timedelta(hours=x)).timestamp() for x in range(5)]
opts = ROOT.RDF.RSnapshotOptions()
opts.fLazy = True # This avoids that Snapshot calls trigger the execution right away
# Book all different Snapshot calls in advance
snapshots = [
df.Filter(f"timestamp >= {hour_begin} && timestamp < {hour_end}")\
.Snapshot(treeName, FileName, listOfCols, opts)
for hour_begin, hour_end in zip(hours[:-1],hours[1:])
]
# Trigger execution of one of the Snapshots, all others will be executed at the same time
snap_df = snapshots[0].GetValue()
Yes, I have 8GB of RAM, so I wouldn’t expect it to actually have run out of memory.
I don’t have easy access to a linux version of root, so I would like to leave that as a last resort. However I based the code on a colleague’s python script using root that runs fine on linux using
tree = f1.Get( “Board 0” )
for evt in tree:
I’m confident that it’s the
tree->GetEntry(i)
call that is the problem. If I remove that then it runs without error, and running in the debugger shows it failing at that line.