I gave a first trial to “fill” a weight bracnh reading a histogram and a couple of variables on my ntuple…
The code i came up and make it run/work is something like this :
All runs without breaking nothing, and apparently fine in MT mode which i would have not expected…
The worry i have using this kind of workflow is that the only way i have to make enter the lamdba function is by capture of the argument, which means that the histogram to use needs to exists before i declare the lambda function itself and that capturing like this do not ensure me that the histogram remains immutable in all threads executed by the DataFrame.
My worries are :
In my setup, the histogram could/could not exists depending on the input i have, thus i can issue or not issue the adding of this extra “branch” or column depenging on the type of input , for this purpose i have to use pointers and dereference it in calls where I enforce no-pointer semantic usage. Thus, i don’t really like capturing the pointer of the histogram in the Define, but i rather would like to capture the histogram object itself, ideally with const identification.
I find a bit strange that the only arguments i can pass to the Define call has to be something belonging as column to the DataFrame. As i use a lambda function i would have expected Define( & h1, "pt") to work, altought i understand there is something big i am missing here.
The only reason why i would prefer to have Define("ALIAS", function, {myHisto,"pt"})
Is that in my function i can enforce const TH1D & myHisto which implies that whatever i do internally the method remains thread safe, and that , i can use a single function to “read” an input histogram and apply this lambda for “any” type of ALIAS i want to add which depends on the content of an histogram and what are the axis on which it gets defined.
I don’t know if what i am saying make sense at all but i really would like to be able to add a new column to the DataFrame reading an external “histogram” which ends up to be shared to all threads and be sure this is always working.I think this is what people typically do to attach “corrections” to the ntuples of the simulation reading some data-driven histogram of correction.
Thanks in advance
Renato
ROOT Version: Not Provided Platform: Not Provided Compiler: Not Provided
Hi Renato,
I’m not sure what your question is, exactly. I’ll try to clarify a few things.
Yes, using a lambda this seems like a sane requirement. You don’t have to use a lambda though (see below).
You can make the captured type a pointer-to-const or a const reference to prevent the lambda from modifying the variable. Example snippet.
In principle you can absolutely do that.
I’m not sure how that would work in C++. Even if we could make it work, I doubt the resulting syntax and semantics would be simpler and saner than lambda captures.
With lambdas:
const TH1D *_histo = new TH1D(...);
auto GetHistogramVal = [&_histo] (double _variable) { ...histo->GetXAxis()...};
Hi @eguiraud, I think the procedure I want to follow goes in the direction of using a functor.
The case is rather simple, saying in an analysis one has efficiency corrections as a function of of some observables and you want to reweight the dataframe entry by entry. You want to have a single functor capable to capture any histogram as input (maybe a functor for a th1d, one for th2d, one for th2poly). Therefore you can declare your input histo at any point of the code and add all columns in sequence just picking up the correct input variable used as axis of the reweight histo.
With lambda capture, you need to define 1 lambda per type of correction as you can capture only what already exists and have been defined. while I would like to avoid that. I give a try with the functor and eventually post here the snippet so it can serve as example for others. I guess the functor is the only way to achieve what I have in mind.
Thanks a lot,
Renato
Unfortunately i cannot promote the TH1D to be const as some of the methods are not const marked.
Also , i had to add in the functors a move constructor otherwise the code was breaking.
A side comment : I noticed that when fillng the initial TTree if i enable MT, i get duplicate value entries, while i don’t when i disable it.
It’s probable that if a method is not marked const, it’s not in general thread-safe. Be careful!
I think it’s because your copy-constructor is wrong: it should take a const reference, not a reference. The way it is, it can’t bind to temporaries. This works, for example:
root [0] ROOT::RDataFrame df(10)
(ROOT::RDataFrame &) An empty data frame that will create 10 entries
root [1] struct Foo { int operator()() { return 42; } };
root [2] *df.Define("x", Foo()).Take<int>("x")
(std::vector<int, std::allocator<int> > &) { 42, 42, 42, 42, 42, 42, 42, 42, 42, 42 }
Your Define lambdas for “i” and “j” in fill_tree are not thread-safe, so that could very well happen.
A thread-safe version: