How to pass fcn func defined on python side to ROOT.Fitter?

jprochaz · October 16, 2010, 8:27pm

Hello,

I would like to fit some data. I have defined model function and also chi-square (“FCN” - I want to customize chi-squere litle bit) on C++ side. If I run this short script:

import ROOT 
from array import array

ROOT.gROOT.ProcessLine(".L mymodel.C+g")
ROOT.gROOT.ProcessLine(".L myfcn.C+g")

x = array('d',[1.])
p = array('d',[1,2])

fitter = ROOT.Fit.Fitter()
fitdata = ROOT.Fit.BinData(1000,1,ROOT.Fit.BinData.kNoError)
datasize = 15 
ci = array('d',[0]*datasize)
for i in range(datasize):
  fitdata.Add(i,i)

fitter.Config().MinimizerOptions().SetMinimizerType("Minuit2")
fitter.Config().MinimizerOptions().SetMinimizerAlgorithm("Minimize")
fitter.Config().MinimizerOptions().SetPrintLevel(2)            

mymodel = ROOT.MyModel()
myfcn = ROOT.MyFCN(fitdata,mymodel)

# Is it possible to define myfun on python side and not on C++ side?
# Something like 
#
# def mycn(params,...):
#    return ...
#    
fitter.FitFCN(myfcn,p,datasize)

fitter.Result().Print(ROOT.cout,True)

then everything work without any problem. I just wonder if there is any way how to define “myfcn” purely on python side and then pass it somehow (magic goes here) to fitter.FitFCN(myfcn,p,datasize). How can I achieve that? Using a wrapper defined on C++ side for python myfun function/class? How it should be implemented?

Thank you in advance for your help,
Jiri
test.py (990 Bytes)
mymodel.C (1.08 KB)
myfcn.C (455 Bytes)

wlav · October 20, 2010, 8:31pm

Jiri,

the trick for such codes as TF1 etc. to call python objects, is to register the python function with an ID and callback in CINT. Isn’t particularly pretty (have a look at bindings/pyroot/src/Pythonize.cxx if you’re really interested).

I’ll see whether I can pythonize FitFCN as well.

Cheers,
Wim

jprochaz · October 24, 2010, 5:44pm

Hi Wim,

thanks for your explenation. I’ve looked at Pythonize.cxx. I was afraid of sometning like that - I can hardly say that I fully understand it…

That would be great! The problem is that I would realy like to customize chi2 (FCN) on python side and not on C++ side (it would allow me to solve many problems…)

Many thanks,
Jiri

jprochaz · November 17, 2010, 1:25pm

Hi Wim,

I still have a problem to fully understend the pythonization just from the code but if you still plan to pythonize the FitFCN function I may help with testing (if needed).

Cheers,
Jiri

wlav · November 17, 2010, 6:17pm

Jiri,

yes, this is still on my short-list. Too many guns against my head right now, though. If you want to help, maybe you can determine which would be best to pythonize: AFAICS the function itself is just a special case, and having a derivable IFitter might be better (but there are several subclasses already, and so that may mean making all of them derivable)?

Cheers,
Wim

jprochaz · November 18, 2010, 8:26pm

Hi Wim,

Super, thanks!

I’m not really sure what you mean by “derivable IFitter” and what kind of subclasses (of ROOT::Fit::Fitter?) I hope I didn’t misunderstand you completely but as I understand the FitFCN function is quite generic as it is possible to minimize an arbitrary function. Or did you also think about Fitter::{Fit,LikelihoodFit,SetFunction,…} which fit a model function to data? Then you went further then me . It would be definitelly also very nice to define a model function on python side and then call one of those functions too! But the reason why I want to use FitFCN is that I need to customize a chi2 (so it is not enought just to set data and a model function to Fitter and then minimize it) + benefit from common Fitter interface to different minimizers. In the case that this function would be pythonized than one can use a model function defined on python side and not on C++ side (if needed).

So, personally I would vote for pythonization of FitFCN as I do not see any better solution - more generic allowing to define data+model+customized_chi2 on python side and then use Fitter interface for minimization.

Cheers,
Jiri

wlav · November 25, 2010, 6:07am

Jiri,

not quite what I meant (then again, I don’t think I was clear, so …). Anyway, I do want to get this in before the November 30 cut-off, so that it makes it into 5.28 and that makes it kind of urgent now (tomorrow is Turkey Day, and I’m traveling to CERN over the weekend).

To go back to the several things that I don’t understand …

You derive MyFCN from ROOT::Fit::Chi2Function, and there are several classes such as Chi2Function. The unfortunate bit is that that would mean that all of them need to be pythonized (and any that come along in the future). That’d not be pretty (besides the other obvious draw-back of the Math dependency in PyROOT). It’d be great if a better solution is available (I’m thinking of deriving only of the two explicit IMultiXYZ objects, and making the user forward the calls if a derived object is needed).
You override DoEval(const double* x), but I don’t find it actually called when running the test.py example? Also, in DoEval() the base class DoEval() is called, but that’s a private function? How did that work out?

So, basically I’m a bit confused as to how this is actually used in practice …

What I did for now, is that I pythonized the minuit version of FitFCN. There are a few problems with that as well. For starters, it only takes a void* (more or less), and there is no equivalent of taking a CINT interpreted function, as there is for TMinuit. I worked around this for now by only allowing a single fit function (which gets stored in a global variable instead of with CINT). Let me know if that is too limiting.

To use any of the existing fitters then, instead of deriving from them, the calls would have to be forwarded from the minuit function. This function can be the call function of a callable object, making it possible for the object to carry state (i.e. the object to be forwarded to).

Let me know if this works for you, or otherwise let me know how ROOT::Fit::Chi2Function should work (or point me to some documentation if that’s available … I didn’t see anything in the User’s Guide?).

Cheers,
Wim

jprochaz · November 25, 2010, 3:53pm

Hi Wim,

I see. I agree that pythonization of 100 function + some others comming from the future is not a good idea I see 3 overloaded Fitter::FitFCN functions:

bool	FitFCN(const ROOT::Math::IMultiGenFunction& fcn, const double* params = 0, unsigned int dataSize = 0, bool chi2fit = false)
bool	FitFCN(const ROOT::Math::IMultiGradFunction& fcn, const double* params = 0, unsigned int dataSize = 0, bool chi2fit = false)
bool	FitFCN(ROOT::Fit::Fitter::MinuitFCN_t fcn, int npar = 0, const double* params = 0, unsigned int dataSize = 0, bool chi2fit = false)

As I understand you pythonized the 3rd one. The second one is more for time optimization (user provides derivatives …) but the first one is IMHO fundamental for fitting a generic function and probably easier to pythonize then the 2nd one.

I added an example how to use the first FitFCN (see attachement) if one properly define on C++ side ROOT::Math::IMultiGenFunction (see class MyFCN2). Note that one has to redefine also virual Clone() function. This was not done in my previous example but it is fixed now. This is the reason why DoEval of the BASE class was called.

For these who wonder that DoEval is private (and virtual) but it is still possible to overloaded it in derived class I may recommend parashift.com/c+±faq-lite/s … tance.html - [23.4] When should someone use private virtuals? .

You may find 2 examples in the attachement: the old one using Chi2Function and new one using just IMultiGenFunction. Both examples work for me so let me know in the case of any problem. In the second example on python side one just does:

print 'EXAMPLE 2 - MINIMIZE A MULTI GENERIC FUNCTION'
myfun2  = ROOT.MyFCN2()
params2 = array('d',[1.])
fitter.FitFCN(myfun2,params2)
fitter.Result().Print(ROOT.cout,True)

And the question is:

[code]

Q: Is it possible to define myfun on python side and not on the C++ side?

Something like:

def myfun2(params):

return …

and then call fitter.FitFCN(myfun2,params2)?[/code]

Maybe it is possible to use (as you suggested) the forwarding from the minuit function + some tricks with call function (you know better then me).

Hope that helps at least little bit…

Cheers,
Jiri
mymodel.C (1.08 KB)
myfcn.C (1.24 KB)
test.py (1.52 KB)

wlav · November 26, 2010, 4:30am

Jiri,

thanks, those were very helpful.

So I added an TPyMultiGenFunction that can be used as a base class, with the express intention of adding a TPyMultiGradFunction (which would follow the exact same lines), if this works for you … That should cover all cases in one way or another.

Basically, for creating your own function, do something like:

[code]class PyMyFCN2( ROOT.TPyMultiGenFunction ):
def init( self ):
ROOT.TPyMultiGenFunction.init( self, self )

def NDim( self ):
    print 'PYTHON NDim called'
    return 1

def DoEval( self, x ):
    ret = x[0]*x[0]
    print 'PYTHON MyFCN2::DoEval val=', ret
    return ret;

[/code]
And for the case where you’d otherwise derive from a more elaborate function class such Chi2Function, make it a data member of the python object and forward all calls to that data member.

Is in trunk. Let me know what you think …

Cheers,
Wim

jprochaz · November 26, 2010, 4:57pm

Hi Wim,

it looks really nice and it works for me!

Some comments/ideas:

If I comment out PyMyFCN2.DoEval and/or PyMyFCN2.NDim

class PyMyFCN2( ROOT.TPyMultiGenFunction ):
    def __init__( self ):
        ROOT.TPyMultiGenFunction.__init__( self, self )

#    def NDim( self ):
#       print 'PYTHON NDim called'
#        return 1

    def DoEval( self, x ):
        ret = x[0]*x[0]
        print 'PYTHON MyFCN2::DoEval val=', ret
        return ret

(e.g. a user forget to define these methods) then I get :

(Bool_t)1
EXAMPLE - MINIMIZE A PYTHON MULTI GENERIC FUNCTION
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/jiri/Desktop/rootProblem/test8/testpyfit.py in <module>()
     26 myfun3  = PyMyFCN2()
     27 params3 = array('d',[1.])
---> 28 fitter.FitFCN(myfun3,params3)
     29 fitter.Result().Print(ROOT.cout,True)
     30 

TypeError: none of the 4 overloaded methods succeeded. Full details:
  std::bad_alloc (C++ exception)
  bool Fitter::FitFCN(const ROOT::Math::IMultiGradFunction& fcn, const double* params = 0, unsigned int dataSize = 0, bool chi2fit = false) =>
    could not convert argument 1
  bool Fitter::FitFCN(ROOT::Fit::Fitter::MinuitFCN_t fcn, int npar = 0, const double* params = 0, unsigned int dataSize = 0, bool chi2fit = false) =>
    could not convert argument 2
  none of the 4 overloaded methods succeeded. Full details:
  bool Fitter::FitFCN(const ROOT::Math::IMultiGenFunction& fcn, const double* params = 0, unsigned int dataSize = 0, bool chi2fit = false) =>
    could not convert argument 1
  bool Fitter::FitFCN(const ROOT::Math::IMultiGradFunction& fcn, const double* params = 0, unsigned int dataSize = 0, bool chi2fit = false) =>
    could not convert argument 1
  bool Fitter::FitFCN(ROOT::Fit::Fitter::MinuitFCN_t fcn, int npar = 0, const double* params = 0, unsigned int dataSize = 0, bool chi2fit = false) =>
    could not convert argument 2
  "<PyCObject object at 0x1d05a58>" is not a valid python callable
WARNING: Failure executing file: <testpyfit.py>
Python 2.7 (r27:82500, Oct 20 2010, 03:28:42) 
Type "copyright", "credits" or "license" for more information.

IPython 0.10.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

which may be confusing and it is not clear that just NDim and/or DoEval have not been defined. Would it be possible to generate an error message such that it would be clear that NDim (DoEval) need to be defined?

I looked at bindings/pyroot/src/TPyFitFunction.cxx (root.cern.ch/viewcvs/trunk/bindi … ortby=date) and found these lines in NDim and DoEval methods:

   if ( ! pyresult )
      return 1;    // probably reasonable default

I don’t understand 100% the pythonization (the code around) but If something goes wrong and one of the function return 1 instead of reasoneble value then it can be quite difficult to find out that there is such default behaviour. I think that throwing an exception with descriptive message would be better.

I have realised that there is probably no option how to specify name of the fitted parameters (there are just default names such as Par_0) but this is not PyROOT related problem but rather ROOT.Fitter problem…
Yes, having the 2nd FitFCN(const ROOT::Math::IMultiGradFunction& fcn,…) pythonized would be usefull too.

Anyway, it looks definitely VERY promising

Cheers,
Jiri
testpyfit.py (949 Bytes)

wlav · November 26, 2010, 9:45pm

Jiri,

some code with improved error reporting is in (is still a bit ugly in the whole traceback, but the relevant error message is in there, so hopefully enough).

I also implemented the TPyMultiGradFunction, but I’d be surprised if it simply worked, as the overload order will make that the MultiGenFunction will match first (so the disp() method would have to be used to get to the explicit overload).

Cheers,
Wim

jprochaz · November 29, 2010, 3:00pm

Hi Wim,

the error handling is definitely much more helpful, thanks.

Concerning the TPyMultiGradFunction - if I run this code:

print 'EXAMPLE 2 - minimization using ROOT.TPyMultiGradFunction'

class PyMyMultiGradFCN( ROOT.TPyMultiGradFunction ):
    def __init__( self ):
        ROOT.TPyMultiGradFunction.__init__( self, self )

    def NDim( self ):
        print 'PYTHON PyMyMultiGradFCN::NDim called'
        return 2

    def DoEval( self, x ):
        ret = x[0]*x[0]+x[1]*x[1]
        print 'PYTHON MyMultiGradFCN::DoEval val=', ret
        return ret

    def DoDerivative( self, x,icoord):
        if icoord == 0: 
          ret = 2*x[0]
        elif icoord ==1:
          ret = 2*x[1]

        print 'PYTHON MyMultiGradFCN::DoDerivative val=', ret
        return ret

myMultiGradFCN = PyMyMultiGradFCN()
params2 = array('d',[1.,1.])
fitter.FitFCN(myMultiGradFCN,params2)
fitter.Result().Print(ROOT.cout,True)

x = array('d',[0.,0.])
g = array('d',[100.,100.])
myMultiGradFCN.Gradient(x,g)
print ' gradient = ', g 
print ' der0 = ', myMultiGradFCN.Derivative(x,0)
print ' der1 = ', myMultiGradFCN.Derivative(x,1)

then the fitting looks OK but when the gradient/derivatives is calculated I get these error(s):

TypeError                                 Traceback (most recent call last)
TypeError: DoEval() takes exactly 2 arguments (3 given)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
...testpyfit.py in <module>()
     65 x = array('d',[0.,0.])
     66 g = array('d',[100.,100.])
---> 67 myMultiGradFCN.Gradient(x,g)
     68 print ' gradient = ', g 
     69 print ' der0 = ', myMultiGradFCN.Derivative(x,0)

Exception: Failure in TPyMultiGradFunction::Gradient (C++ exception)

Looking at bindings/pyroot/src/TPyFitFunction.cxx (root.cern.ch/viewcvs/trunk/bindi … ortby=date) I guess that in the functions TPyMultiGradFunction::{Gradient,Fdf,DoDerivative} there is one wrong line with “DoEval”

PyObject* pyresult = DispatchCall( fPySelf, "DoEval", xbuf, pycoord );

but “Gradient” , “Fdf” and “DoDerivative” should be there instead.

Cheers,
Jiri
testpyfit.py (2.22 KB)

wlav · November 29, 2010, 3:17pm

Jiri,

a yes, the toxic mix of copy & paste and too little sleep. Fixed now. Thanks for the test code.

Cheers,
Wim

jprochaz · November 29, 2010, 3:55pm

Wim,

if the functions FdF and Gradient are not defined on python side than an exeption is thrown:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)


AttributeError: method Gradient needs implementing in derived class
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)

...testpyfit.py in <module>()
     65 x = array('d',[0.,0.])
     66 g = array('d',[100.,100.])
---> 67 myMultiGradFCN.Gradient(x,g)
     68 print ' gradient = ', g 
     69 print ' der0 = ', myMultiGradFCN.Derivative(x,0)

Exception: Failure in TPyMultiGradFunction::Gradient (C++ exception)

but it would be better to call ROOT::Math::IMultiGradFunction::{Gradient,FdF} instead (they are already defined in that class and may just be changed in a derived class…).

Cheers,
Jiri

wlav · November 29, 2010, 4:44pm

Jiri,

yes, missed that (all the other functions are virtual = 0 …). Now in. Hope we’re really converging now.

Cheers,
Wim

jprochaz · November 29, 2010, 5:28pm

Wim,

it works but if I want to define e.g. the Gradient method like:

    def Gradient( self, x,grad):
        print 'PYTHON MyMultiGradFCN::Gradient'
        ROOT.TPyMultiGradFunction.Gradient(x,grad)

i.e. just to call the default gradient then I get an error:

TypeError                                 Traceback (most recent call last)

./testpyfit.py in <module>()
     74 x = array('d',[0.,0.])
     75 grad = array('d',[100.,100.])
---> 76 myMultiGradFCN.Gradient(x,grad)
     77 print ' gradient = ', grad 
     78 print ' der0 = ', myMultiGradFCN.Derivative(x,0)

/home/jiri/Desktop/rootProblem/test8/testpyfit.py in Gradient(self, x, grad)
     51         print 'PYTHON MyMultiGradFCN::Gradient'
     52         #super(PyMyMultiGradFCN,self).Gradient(x,grad)
---> 53         ROOT.TPyMultiGradFunction.Gradient(x,grad)
     54 
     55     def FdF( self, x,f,df):

TypeError: void TPyMultiGradFunction::Gradient(const double* x, double* grad) =>
    unbound method TPyMultiGradFunction::Gradient must be called with a TPyMultiGradFunction instance as first argument

It looks like a problem with conversion of python array to const double* …

Few more iterations and we are hopefully there

Cheers,
Jiri
testpyfit.py (2.59 KB)

wlav · November 29, 2010, 5:32pm

Jiri,

that error is correct. The proper python syntax requires self to be passed as the first argument. But note that if you do call this method, it will dispatch right back to the overridden method, and you’ll eventually get a stack overflow.

Cheers,
Wim

jprochaz · November 29, 2010, 6:31pm

Wim,

got it! Thanks for the explenation of it’s reason. Does that mean that there is no way at all how to call a base method in the situations such as the calling of the default gradient on python side? I mean a “simple” way. It could be sometimes quite useful…

Personally, I’m quite happy with what was done up to now (for this moment) .

Thanks,
Jiri

wlav · November 29, 2010, 6:42pm

Jiri,

you can call the base class method, but you’d have to call the one of the C++ base class (i.e. ROOT.Math.IMultiGradFunction), not the one of TPyMultiGradFunction as that class’ existence is purely to enable the forwarding (at some point real inheritance will be possible, but that requires some technology changes … maybe when all of this is pypy and llvm).

And if you’re happy for now, the this is what will end up in 5.28.

Thanks,
Wim

jprochaz · November 30, 2010, 10:10am

Hi Wim,

I don’t know why but even if I define the Gradient function like you suggested:

    def Gradient( self, x,grad):
        print 'PYTHON MyMultiGradFCN::Gradient'
        ROOT.Math.IMultiGradFunction.Gradient(self,x,grad)

then I get again a stack overflow…

I also do not understand why DoDerivative is not called at all during fitting and/or calculating errors (hesse). Only DoEval is called all the time, see output of the attached script. It looks to me that there is something wrong with cloning and/or downcasting…

Yes, real inheritance would be, of course, also real solution. I hope this amazing feature will come “soon”

Cheers,
Jiri
testpyfit.py (2.62 KB)