Thanks for the help @eguiraud,
I’d like to ask a follow up question(s). How should I implement this with a dictionary? Meaning that I have a code such as
# fits is an empty directory with floats as keys
# for example {1.0: None, 2.0: None, ... 300.0: None}
for idx, func in enumerate(fits):
# MC and DT are TH2D
h1 = MC.ProjectionY(f"MC{idx}", idx, idx)
h2 = DT.ProjectionY(f"DT{idx}", idx, idx)
h1.Divide(h2)
h1.Fit("chebyshev4", "S")
fits[func] = h1.GetFunction("chebyshev4")
and now I would like to declare the variables of the dictionary to C++. How would this be done? The end goal is to use a different function based on the value in a column of the RDF.
And then to go a bit deeper, I would also like to use Spark with the RDF’s, so the variables need to be spread among the workers. I’ve previously used ROOT.RDF.Experimental.Distributed.initialize
to use custom C++ functions, for example (you’ve probaly seen/made this)
initialize = ROOT.RDF.Experimental.Distributed.initialize
def initialize():
ROOT.gInterpreter.Declare("
#ifndef MYFUN
#define MYFUN
int myfun(){ return 42; }
#endif")
initialize(initialize)
But how would I spread the variables instead of the functions?
All in all I would like to do something like (in a simple case with two functions):
# dictionary with two functions
fits[1.0] = myHist.Fit("poly2")
fits[2.0] = myHist.Fit("poly4")
# Do something so that the Spark workers "know" about the fits
# ...
# Create a column to a ROOT.RDF.Experimental.Distributed.Spark.RDataFrame with the fitted functions
# If x columns value is less than 1 use the first function, else use the second function
df1 = df.Define("y", "x < 1.0 ? fits[1.0].Eval(y) : fits[2.0].Eval(y)")
the last line naturally wouldn’t work as I’m using Python syntax and a dictionary, and it comes back to my first question. Is this possible to do somehow?
Now, I’ve used floats as keys in all the examples, but it doesn’t have to be like that if that makes it particularly difficult.
Thanks for you time