Optimization of filter/define workflow

Dear root experts
I have a question concerning caching Filters results and following defines.
I basically write up a code which stores in a map<Myslice, vector> the outcome of a Take operation on some branches.
Where each Slice has a prior cut applied.
Sometimes Myslice uses the same exact cut of other slices, so I was tempted to make a

map<Uint_t, RNode> where I store

Cutstring.Hash() as key and the dataframe.Filter(CutString) outcome such that I don’t have to bookkep several times the same filter and bookkep several time the same define expression.

My question boils down to :

Is a Filter interpreted as a RNode?
I looked tô the documentation but I am not sure.
Thanks for the feedback.
Renato

Hi Renato,
yes, any dataframe object is convertible to a ROOT::RDF::RNode (it’s easy to verify from the ROOT prompt).

Cheers,
Enrico

Thanks a lot,
I think it will help a lot in my current work flow.
I am using a huge ntuple and bookkeeping kind of 5000 operations (nodes) on the initial huge ntuple. The virtual memory usage is O(180 Gb). With caching around I can go down of an order of magnitude of nodes definition. I suppose this will help. My jobs gets indeed killed by the system due to the large memory consumption. I give a try and report it back. Thanks

Making that change and recycling the previously filtered dataframe helped a lot ( roughly a factor 10 speedup and much less memory pressure) thanks a lot.
I was wondering if in the future it’s foressen to do automatic filters caching.
I mean hash the cut string expression and if it already exist in a given processing graph at the same stage, just reuse it and book other expressions. Thanks a lot for the hep anyway, indeed the root shell was enough to find it out :slight_smile:

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.