Bad Allocation Error looping though TTree

DistantOstrich · March 7, 2023, 5:34pm

Thanks, I’ve uploaded my code. The root file is too large to upload directly, so here is a link to it on my OneDrive:
RootFiles

TestSnippet.cpp (713 Bytes)

Wile_E_Coyote · March 7, 2023, 5:58pm

On my Linux, it returns: 3784193462396

To me, it’s strange that you explicitly: #define _HAS_CXX17 1
It may lead to severe problems (on Windows; it doesn’t seem to be used on Linux), I think.

BTW. “nEntries” and “i” should be “Long64_t”.

vpadulan · March 7, 2023, 7:47pm

You’re welcome! Note that my snippet above is not 100% working code, I also just updated to make it even more realistic but you may need to adjust it still. I also added the link to the RDataFrame docs if you need more information about the different parts of the API shown.

Cheers,
Vincenzo

vpadulan · March 7, 2023, 7:50pm

Hi, just adding another note here, this is literally the worst thing you can do in terms of performance when writing a Python script that processes a TTree, as described here. Please avoid this pattern at all costs

DistantOstrich · March 8, 2023, 8:26am

Thanks for testing it, that’s the result I would expect. It looks like it’s something specific to either Windows or my machine then.

The #define _HAS_CXX17 1 allows me to use C++ 17 features when I compile it as a normal C++ application, which I often do for speed of testing in Visual Studio. Without it my full code does not run in root. Removing that line from TestSnippet.cpp doesn’t change the behaviour.

Wile_E_Coyote · March 8, 2023, 8:32am

I don’t think this is a proper approach.
You should pass appropriate (Windows specific) flags to the compiler. I guess @bellenot can help with this.

An example (note that the “debug” and C++ standard flags must correspond to your ROOT binary distribution):

bellenot · March 8, 2023, 8:42am

You can check the flags used with ROOT with root-config --cflags:

C:\Users\bellenot>root-config --cflags
 -nologo -Zc:__cplusplus -std:c++17 -MD -GR -EHsc- -W3 -D_WIN32 -O2 -IC:\Users\bellenot\build\x64\release\include

I.e. if you wand to recognize the __cplusplus macro in your code, you need the -Zc:__cplusplus compiler flag.

bellenot · March 8, 2023, 9:02am

FYI I manage to reproduce the crash. I will investigate.
P.S. Even a simple tree->Draw("timeStamp") doesn’t work:

root [0] TFile* tf = TFile::Open("run277_lf.root", "READ");
root [1] TTree* tree = dynamic_cast<TTree*>(tf->Get("Board 0"));
root [2] tree->Draw("timeStamp");
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1
Error in <TRint::HandleTermInput()>: std::bad_alloc caught: bad allocation
root [3]

But works with the energy branch:

C:\Users\bellenot\Downloads>root -l
root [0] TFile* tf = TFile::Open("run277_lf.root", "READ");
root [1] TTree* tree = dynamic_cast<TTree*>(tf->Get("Board 0"));
root [2] tree->Draw("energy");
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1
root [3]

Wile_E_Coyote · March 8, 2023, 9:33am

“energy” is a “UShort_t” while “timeStamp” is a “ULong64_t”

bellenot · March 8, 2023, 9:34am

Yes, I know…

DistantOstrich · March 8, 2023, 11:27am

A variation of this RDataFrame method worked for me, thank you!

bellenot · March 8, 2023, 12:56pm

Can you post the code here, for the record?

DistantOstrich · March 8, 2023, 1:59pm

Here is the code that worked:

import sys
import os
import ROOT


print("Enter the run number")

runNumber = input()

fileName = "C:/Path/To/Root/Files/run" + runNumber + ".root"
outFilePath = "C:/Path/To/Output/Files/run" + runNumber
if not os.path.exists(outFilePath):
    os.makedirs(outFilePath)

outFileBase = outFilePath + "/Hour"

df = ROOT.RDataFrame("Board 0", fileName)

print("How many whole hours did the run last?")

nHours = int(input())

oneHour = 3600 * 10000000
hours = [oneHour*x for x in range(nHours)]

for i in range(nHours):
    startTime = int(hours[i])
    endTime = int(hours[i]) + oneHour
    outFile = outFileBase + str(i) + ".root"
    thisHour = df.Filter(f"timeStamp >= {startTime} && timeStamp < {endTime}").Snapshot("Board 0", outFile, {"channel", "timeStamp", "energy"})

vpadulan · March 8, 2023, 4:00pm

Hi @DistantOstrich ,

I am glad that RDF could work for you! But pay attention to one very important detail, it is not by chance that I was adding the opts = ROOT.RDF.RSnapshotOptions() in my code example above. In your latest snippet you are doing

for i in range(nHours):
    ...
    df.Filter().Snapshot()

Which means that you are creating the new files one at a time, one per iteration. This is because the Snapshot method of RDataFrame is an “instant” method, that is it gets executed as soon as you call it by default. One of the best features of RDF is its lazyness: you can book all operations you want to run on the dataset, then execute them all in the same event loop just once. You can force Snapshot to be lazy too, with the RSnapshotOptions I was showing in my example above.

If you do that, you will fill all the files in the same event loop all together, instead of doing it one event loop per iteration of your for loop. It can potentially save you a lot of time.
Cheers,
Vincenzo

system · March 22, 2023, 4:01pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.