I’m trying to implement a few cuts using RDataFrame and then save the tree resulting from these cuts in a new file. I had a problem that both python and c++ versions just keep increasing memory use until they hit the cap and the code throws segmentation violation.
The original tuple in question is about 500MB.
Is there any bug in my code, or is this tool not supposed to be used like this?
I’m attaching the files that I wrote trying to do the same thing.
Hi,
these kind of issues are usually due to a multi-thread snapshot of a ROOT file with highly suboptimal clustering (e.g. each entry of the input TTree is zipped by itself).
What happens in those cases is that reading takes much less than writing entries, so (uselessly large) unwritten buffers of data start accumulating in memory. See ROOT-9133.
So let’s check if you are indeed seeing ROOT-9133.
@Eddy_Offermann do you see elevated RAM usage when executing the macro with root -l -b -q or only when executing with EnableImplicitMT (i.e. root -l -b -q -t)? (will be running it myself asap)
@vfranco can you confirm that removing EnableImplicitMT() from your macro “fixes” the issue?
If yes, you can either check the clustering of your file with
TTree *t = nullptr;
file.GetObject("treename", t);
auto it = t->GetClusterIterator(0);
for (auto entry = it.GetStartEntry(); entry != t->GetEntries(); entry = it.GetNextEntry())
std::cout << entry << std::endl;
or something similar (have not tested the code but should give you an idea)
or as @dpiparo suggests you can share your file with us.
Hi Eddy,
in your case I see roughly 10 MB/sec being allocated and never released.
It decreases to roughly 1 MB/sec if I change Snapshot(...) to Snapshot<int, float>(..., {"b1", "b2"}).
It’s the jitting – every dataframe computation graph that you allocate in that while(true) infinite loop just-in-time compiles a few things (the bigger of which is the call to Snapshot if you don’t pass the template parameters) and that memory is never released (which is expected).
If this is the amount of memory hogging that you see too, I don’t think this is a bug, in general it should not be a problem if instantiating an RDF computation graph with just-in-time compiled components takes a few MB.
If you see the amount of RAM hogging that @vfranco talks about (enough to use up all available RAM in a few seconds) I can’t see it. If not, I still think that @vfranco is hitting ROOT-9133.
Thanks for the test data, we’ll check what’s going on asap.
The cluster iterator loops over TTree cluster boundaries. A cluster is a batch of entries that are compressed together. A normal cluster iteration jumps over many entries, so you should see something like 0 2140 5899 .... If you see the cluster iteration doing very small steps, e.g. 0 1 2 ... it means that you have bad clustering (e.g. each entry is compressed by itself) and that not only causes bad reading performance per se (no matter how you read the ROOT file), but it also happens to trigger this issue of Snapshot where (abnormally large) buffers to be written to disk queue in your RAM.
@vfranco ok I know what the problem is Your ntuple has 564 branches, which means that your Snapshot(...) call is just-in-time compiled to a function call with 564 template parameters (think WriteBranches<type1, type2, type3, type4>(...).
Now, that takes a long while and a lot of RAM to compile. You never even reach the event loop, you just spend all your time jitting code.
This is issue ROOT-9468, and luckily I have a PR open that mitigates this problem by factors.
With the patch, it takes 20 seconds and a few hundreds megabytes to run your macro – most of the time is still spent in just-in-time compilation, but the situation is much much better than before.
I don’t know if anything else can be done for such large Snapshots.
Would this fix the problem for you?
First of all my apologies to @vfranco for hijacking his topic !
Enrico, I want to come back to your observation that if you spell out
to Snapshot the types, you significantly reduce the memory increase.
You do not seem to be worried by he fact that it is not reduced to zero, why ?
I observed in RDFActionhelpers.hxx, routine Finalize that you delete the Tree but do not seem to do anything with the Branch data elements.
TTree *t = new TTree(....);
TMyClass *data = new TMyClass();
t->Branch("name,"TMyClass",&data);
Hi Eddy,
there is still a tiny bit of jitting left in every dataframe instantiation (removing it is on the to do list), so every time you instantiate the compuitation graph you pay a bit of RAM to the interpreter that will only be released at application teardown.
If you think we have a memory leak could you open a jira ticket with the exact ROOT version and line number please? I can’t find the pattern you mention.
The RDF…hxx is very abstract code and I do not dare to claim that the
branch data is not destroyed. I just mentioned the pattern and hope
that you know whether it was implemented.
@vfranco the patch that speeds up snapshots of a large number of branches was just merged.
If you have the possibility to try out ROOT’s master branch (you can also pick up the nightly builds from cvmfs if recompiling is too annoying) it would be great if you could let me know if this is a reasonable solution for you.
(personally I’m also curious what kind of analysis requires an ntuple with so many variables )
First of all, thanks for the help with this.
I’m trying to find the build on the nightlies folder of CVMFS to test it out, but as long as the ram usage doesn’t explode, it should be fine.
Regarding the number of branches on the tuple: I agree that the analises won’t actually use all of these variables. But, in order to avoid having to re-submit jobs to the grid to get variables that were not present in the first place, I think it is fairly common to just generate a big tuple with everything possible and then trim the tuple to a more manageable size and variables afterwards.
Also, the tuple generation uses tools that calculate values for variables in blocks such as “kinematics” so you get all of them, even though one might use 2 variables in 10.
Hi,
memory does not explode – if it does, let us know.
The patch was merged today so you will have to wait for tomorrow’s nightlies
You can find them on cvmfs at /cvmfs/sft.cern.ch/lcg/views/dev3/latest/.