I have not yet done any detailled study on the problem I’m facing, but it seems that if I try to run RDF on lxplus machines, my job is killed by the machine due to a too large memory comsuption.
For example, trying to run with
ROOT.ROOT.EnableImplicitMT(10) with 11 input files, each of them containing 100k events (5GB in total) the job is killed after few minutes.
So before doing more detailed profiling, is there some recommendations to run over very large datasets? I’d like to process up to 100Million events, should we for example use more input files but with less event per file? Or nothing like this is supposed to take place and I mostly have a memory leak somewhere?