Dear experts,
I am running RDataframes and am doing some tests when running on merged/unmerged files. We have a lot of files with only very few events stored within the ntuples of only a few MB. I tried first merging them into bigger files of about 0.5-1 GB. This lead to a speed improvement of about a factor 5. This is of course great new, although not completely unexpected I suppose. However what worries me a bit is that I noticed that running the code on the merged files also increased the memory consumption by about 25%. From 20GB virtual memory (unmerged to about 28GB (merged) and from 15GB RES memory to 19GB. Is this a known issue? Naively I always thought this was quite a strength of root, eg that the size of the ntuple would not increase the memory consumption, given that only 1 event was in memory (or a few now that rdataframes is multithreaded).
To give a bit more an idea of what I am exactly doing. I have multiple RDatasetSpecs (~25), each with their own RDF graph, which I then run with the RunGraphs option. I only make histograms, about 100 per RDF.
Personally its not such a big problem as the PC on my local cluster can deal with this increase in memory, however lxplus usually kills process with large memory consumption…
Anyway, my question is if this is expected behavior or not?
Cheers,
Jordy
my root is setup from cvmfs via:
which root
/cvmfs/sft.cern.ch/lcg/views/LCG_106b/x86_64-el9-gcc11-opt/bin/root
ROOT Version: 6.32.08
Built for linuxx8664gcc on Dec 03 2024, 17:12:25
From tags/6-32-08@6-32-08