I am trying to upgrade myself and use the latest greatest RDataFrame functionality and in particular the Vary function.
Lets Say i want to obtain 100 varied 1D histogram obtained from 100 variation of a weight.
Namely I have a RVec column filled with random poissonian weights representingy 100 bootsrapping slices of my data and a baseline weight attached from an existing Map.
What i want to achieve without having to define 100 new columns for each weight[i] are 100 histograms of a given variable where each of the histogram Is obtained with
If i understood correctly the Vary and VariationFor has been designed exactly to achieve this goal but i failed to understand the example in the doxygen. In the past i created a custom Book function filling a 2D histogram with 100 bins in y, but i feel like i better should use Vary.
yep that’s it, you need to have the nominal value already in a column and then return an RVec of varied values from the Vary expression.
Let us know if you have any more questions or if you think anything specific in the docs should be expanded/clarified – we’ll be adding a Vary tutorial soon.
The Only bit which cofused me a bit Is the role of the Maps[“nominal”] content.
In practice Is that a pointer to the main result pointer which invoke the Vary call?
In other words
Does the Vary expect in the variation setup , to re-use the “scalar” input?
I.e , does
nominal = node.Vary("weight", "weight_BS * RndPoisson", [ f"BS_{i}" for i in range(100) ] )\
.Histo1D( r.RDF.TH1DModel("","",100,5100,5450), "massB", "weight")
histsVaried = r.RDF.Experimental.VariationsFor( nominal)
will perform a replacement of weight with weightBS[i] * RndPoisson[i] for each map value?
In other word, does the second argument of the Vary function must re-use the baseline column name of the first one or it can be replaced by anything one want, as long as it’s already defined in the node?