RAM blows up while using Dataframes/Snapshot

Hi,

I’m using
----- ROOT Version: 6.26/06
----- Python 3.8.13

I’m trying to merge two ROOT files with different branches (but same number of entries) in pyroot using AddFriend/dataframes/Snapshot. The file sizes are 50 MB and 200 kb. I’m using the following code:
f1 = ROOT.TFile.Open(“file_1.root”,“READ”) #50 MB
f2 = ROOT.TFile.Open(“file_2.root”,“READ”) #200 kB
t1=f1.Get(“nominal”)
t2=f2.Get(“nominal”)
t1.AddFriend(t2)
df = RDataFrame(t1)
df.Snapshot(“nominal”,“new_file.root”) #produces a file of size ~50MB, branches added etc, all good
f2.Close()
f1.Close()

The code works as expected, but uses ~1.2 GB RAM, which is crazy! I mapped the RAM usage, in the first 0.5 seconds, everything up to the RDataframe creation is executed. The df.Snapshot takes ~100 second, and that’s the part where the memory blows

Any pointers on why this might be happening? Or a better workaround :slight_smile:
While this is a test file, I’m dealing with files upto ~15 GB in size and such a blowup in RAM is not affordable with the system I use.

Cheers and thanks in advance!

Hello @ShaliniEpari ,

and welcome to the ROOT forum!

If I understand correctly the assumption behind the question is that larger files will cause memory to grow even more, but that’s not usually the case – it could be that it will stay around that level, ROOT and RDataFrame generally try not to cache anything in memory that scales with the number of entries (there are exceptions but I don’t think they apply here).

As a first step I would suggest to switch to a more recent ROOT version, possibly v6.28, and see whether the situation is better there.

If that’s not the case, and 1.2 GB of resident memory is a deal breaker for your use case, we need to check where the allocations come from. You should run the program under valgrind --tool=massif and then post here the output of ms_print (see for example the instructions at 5.3.3. Profiling Heap and Stack Space with Massif Red Hat Enterprise Linux 6 | Red Hat Customer Portal; the discussion at Optimize memory usage while using RDataFrame - #3 by eguiraud might also interest you).

All the best,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.