Gradient and FdF not used by GSLMinimizer

jseneca · April 28, 2022, 1:41pm

Hi all,

I would like to use the BFGS2 algorithm provided by the GSL minimizer library for a multidimensional minimization. I have optimized functions which return the gradient of my function, as well as an optimized function that returns the function value and its gradient. As indicated in the Math::Minimizer class, I need to derive a class from ROOT::Math::IGradientFunctionMultiDim and overload the Gradient and FdF functions. This is done here:

#include "Math/IFunction.h"

struct JppGradEFunc : public ROOT::Math::IGradientFunctionMultiDimTempl<double>
{
  JppShowerE_simple* pdf;

  virtual ROOT::Math::IGradientFunctionMultiDimTempl<double> *Clone() const { return new JppGradEFunc(*this); }
  
  virtual unsigned int NDim () const { return 7; }
  
  virtual double DoEval(const double *x) const
  {

    double res = pdf -> eval_lik( pow( 10, x[0] ), x[1], x[2], x[3], x[4], x[5], x[6] ); //use pdf here

    cout  << "res DoEval: " << res << " x[0]: " << x[0] << " x[1]: " << x[1] << " x[2]: " << x[2] << " x[3]: " << x[3] << " x[4]: " << x[4] << " x[5]: " << x[5] << " x[6]: " << x[6] << endl;

    return res;

  }

  virtual double operator()(const double *x) const
  {
    double res = pdf -> eval_lik( pow( 10, x[0] ), x[1], x[2], x[3], x[4], x[5], x[6] ); //use pdf here
    
    cout  << "res operator(): " << res << " x[0]: " << x[0] << " x[1]: " << x[1] << " x[2]: " << x[2] << " x[3]: " << x[3] << " x[4]: " << x[4] << " x[5]: " << x[5] << " x[6]: " << x[6] << endl;
    
    return res;
  }
  
  virtual double DoDerivative(const double *x,
                              unsigned int icoord) const
  {

    std::pair< double, vector< double > > res = pdf -> eval_lik_grad( pow( 10, x[0] ), x[1], x[2], x[3], x[4], x[5], x[6] );
    
    res.second[0] *= pow( 10, x[0] ) * 2.3025850929940; //ln(10), dE/dlogE
    cout  << "res DoDerivative: " << res.second[icoord]  << " x[0]: " << x[0] << " x[1]: " << x[1] << " x[2]: " << x[2] << " x[3]: " << x[3] << " x[4]: " << x[4] << " x[5]: " << x[5] << " x[6]: " << x[6] << endl;

    return res.second[icoord];
  }
 
  virtual void FdF (const double* x, double& f, double* df) const
  {
    
    std::pair< double, vector< double > > res = pdf -> eval_lik_grad( pow( 10, x[0] ), x[1], x[2], x[3], x[4], x[5], x[6] );
    f = res.first;
    res.second[0] *= pow( 10, x[0] ) * 2.3025850929940; //ln(10), dE/dlogE
    for( uint ii = 0; ii != 7; ii++ ) df[ii] = res.second[ii];
    print ( "fdf", f, df[0], df[1], df[2], df[3], df[4], df[5], df[6] );

  }

  virtual void Gradient (const double* x, double* df) const
  {
    
    std::pair< double, vector< double > > res = pdf -> eval_lik_grad( pow( 10, x[0] ), x[1], x[2], x[3], x[4], x[5], x[6] );
    res.second[0] *= pow( 10, x[0] ) * 2.3025850929940; //ln(10), dE/dlogE
    for( uint ii = 0; ii != 7; ii++ ) df[ii] = res.second[ii];
    print ( "Gradient", df[0], df[1], df[2], df[3], df[4], df[5], df[6] );

  }


};

Unfortunately, here is a typical output of the minimizer:

res DoEval: 5158.01 x[0]: 3.56329 x[1]: -406.648 x[2]: 54.4139 x[3]: 217.13 x[4]: 1.87961 x[5]: 2.83086 x[6]: 10.0575
res DoDerivative: 42.5176 x[0]: 3.56329 x[1]: -406.648 x[2]: 54.4139 x[3]: 217.13 x[4]: 1.87961 x[5]: 2.83086 x[6]: 10.0575
res DoDerivative: -3.98599 x[0]: 3.56329 x[1]: -406.648 x[2]: 54.4139 x[3]: 217.13 x[4]: 1.87961 x[5]: 2.83086 x[6]: 10.0575
res DoDerivative: 25.6745 x[0]: 3.56329 x[1]: -406.648 x[2]: 54.4139 x[3]: 217.13 x[4]: 1.87961 x[5]: 2.83086 x[6]: 10.0575
res DoDerivative: 1.80081 x[0]: 3.56329 x[1]: -406.648 x[2]: 54.4139 x[3]: 217.13 x[4]: 1.87961 x[5]: 2.83086 x[6]: 10.0575
res DoDerivative: -42.2422 x[0]: 3.56329 x[1]: -406.648 x[2]: 54.4139 x[3]: 217.13 x[4]: 1.87961 x[5]: 2.83086 x[6]: 10.0575
res DoDerivative: 9.89843 x[0]: 3.56329 x[1]: -406.648 x[2]: 54.4139 x[3]: 217.13 x[4]: 1.87961 x[5]: 2.83086 x[6]: 10.0575
res DoDerivative: 16.6893 x[0]: 3.56329 x[1]: -406.648 x[2]: 54.4139 x[3]: 217.13 x[4]: 1.87961 x[5]: 2.83086 x[6]: 10.0575

The Gradient and FdF functions are not used at all! What gives?

I did attempt to use Migrad from Minuit2. In that case, the Gradient function was indeed used, but not the FdF function.

Thank you in advance for the help!

ROOT Version: 6.22/06
Platform: Not Provided
Compiler: g++ 4.8.5

moneta · April 28, 2022, 5:47pm

Hi,
It is strange, the FdF function should be used by the GSLMinimizer. It is expected that is not used in Minuit or Minuit2. Maybe there is an issue setting the function to the minimizer, I would need to see your code calling and configuring the Minimizer.

Thank you

Lorenzo

jseneca · April 29, 2022, 8:34am

Hi Lorenzo, thank you for the reply. Here is the function where the minimizer is initialized and called. I’ve made some of the variables more explicit so you can see what is in them.

init_offset is a function which calls the function to be minimized once to get a starting value that can be subtracted from the future function results - it’s just to avoid returning very large numbers to the minimizer.

Trk JppShowerFitE::fit( Trk& start_track , 
                        Det& det, 
                        vector<Hit>& hits, 
                        int do_init_pdf)
{ 
  TStopwatch W;
  if( do_init_pdf ) _init_pdf( start_track, det, hits );

  //cout << " name: " << _minimizer_name.c_str() << endl;
  //cout << " algo: " << _minimizer_algo.c_str() << endl;
  //ROOT::Math::Minimizer* M =
  //ROOT::Math::Factory::CreateMinimizer( _minimizer_name.c_str(), _minimizer_algo.c_str());

  ROOT::Math::Minimizer* M =
    ROOT::Math::Factory::CreateMinimizer( "GSLMultiMin", "BFGS2" );

  //M->SetMaxFunctionCalls(10000);  // for Minuit/Minuit2 
  M->SetMaxIterations(1000);      // for GSL 
  //M->SetTolerance( tolerance ); //Determines the point at which migrad should stop
  M->SetPrintLevel( verbose - 2 ); // -1 seems to be the lowser setting in Gslminimizer

  JppGradEFunc gf;
  gf.pdf = pdf;

  M->SetFunction( gf );

  //Variables are: [posx, posy, posz, theta, phi, time, energy]
  //See showrec_util.hh -> to_trk
  double vstart[7];
  trk_to_smax_pars( start_track, vstart );
  vector<double> step = pack( 1.0, 1.0, 1.0, 0.01, 0.01, 1.0, 0.1 );
  
  if (verbose) print ( "fixing ", fixed_vars );

  for (int i=0; i<7; i++ )
    {
      if ( contains( fixed_vars, i ) )
        {
          M -> SetFixedVariable( i, parnames()[i], vstart[i] );
        }
      else
        {
          M -> SetVariable( i, parnames()[i], vstart[i], step[i] );
          if (verbose) cout << "Start "  << parnames()[i] << ": " << vstart[i] << endl;
          M -> SetVariableLimits( i, parlimits()[i].first, parlimits()[i].second );
          if (verbose) cout << "Limits " << parnames()[i] << ": " << parlimits()[i].first << ", " << parlimits()[i].second << endl;
        }
     } 
  init_offset(vstart );

  // ----------------------------------------
  //  do the fit
  // ----------------------------------------

  //ErrorHandlerFunc_t oldhandler = SetErrorHandler( fit_errorhandler );

  M -> Minimize();

[...]

moneta · April 29, 2022, 9:15am

Hi,
Can you try doing:

M->SetFunction( static_cast<ROOT::Math::IGradientFunctionMultiDimTempl<double> &>(gf));

jseneca · April 29, 2022, 11:33am

This seemed to work! The minimizer now calls both FdF and Gradient.

However I noticed that FdF is called quite rarely, and Gradient is often called in tandem with DoEval. The implementation of BFGS in scipy uses FdF almost exclusively. Even when I remove Gradient, it prefers using DoDerivative over FdF. FdF is only 70% slower than DoEval, and just as fast as Gradient, so it is disappointing that it is used so little! Is there a remedy to this? Here is an example output:

Calling Gradient: -1.6185 x[0]: 3.31979 x[1]: 118.078 x[2]: -456.444 x[3]: 695.438 x[4]: 2.10569 x[5]: 4.15556 x[6]: 38.1848
Calling DoEval: 3539.31 x[0]: 3.3198 x[1]: 118.078 x[2]: -456.444 x[3]: 695.438 x[4]: 2.10571 x[5]: 4.15556 x[6]: 38.1848
Calling Gradient: -1.6166 x[0]: 3.3198 x[1]: 118.078 x[2]: -456.444 x[3]: 695.438 x[4]: 2.10571 x[5]: 4.15556 x[6]: 38.1848
Calling DoEval: 3539.31 x[0]: 3.31981 x[1]: 118.078 x[2]: -456.444 x[3]: 695.438 x[4]: 2.10573 x[5]: 4.15555 x[6]: 38.1848
Calling Gradient: -1.61496 x[0]: 3.31981 x[1]: 118.078 x[2]: -456.444 x[3]: 695.438 x[4]: 2.10573 x[5]: 4.15555 x[6]: 38.1848
Calling DoEval: 3539.31 x[0]: 3.31981 x[1]: 118.078 x[2]: -456.444 x[3]: 695.438 x[4]: 2.10574 x[5]: 4.15555 x[6]: 38.1848
Calling Gradient: -1.61352 x[0]: 3.31981 x[1]: 118.078 x[2]: -456.444 x[3]: 695.438 x[4]: 2.10574 x[5]: 4.15555 x[6]: 38.1848
Calling DoEval: 3539.32 x[0]: 3.31982 x[1]: 118.078 x[2]: -456.444 x[3]: 695.438 x[4]: 2.10575 x[5]: 4.15554 x[6]: 38.1848

I switched to the SteepestDescent algorithm which yields good and fast results, using FdF only. However, the GSL docs indicate “The steepest descent method is inefficient and is included only for demonstration purposes.”

Thanks for the help!

moneta · April 29, 2022, 2:54pm

Hi,
It is GSL that controls which function to call, so we cannot improve much on this.
Note that Minuit2 provides also BFGS and you can provide your own gradients for the minimization. It is maybe worth comparing if it is more efficient.
To use it, pass “bfgs” instead of “migrad” for the minimizer algorithm.

Lorenzo

jseneca · April 29, 2022, 3:39pm

Ah, the ROOT docs did not indicate the Minuit BFGS implementation! Unfortunately, the Minuit implementation suffers from the same issues of being littered with DoEval calls, which is again slower than SteepestDescent… You are right, the underuse of FdF seems to stem from the GSL implementation, I’m curious as to why Scipy is able to live with only FdF calls (and converge much quicker) despite being the same algorithm…

moneta · April 30, 2022, 5:51pm

Interesting, what you see in Scipy. Do you know if Scipy is providing its own implementation of the algorithm or is it relying on some already existing code in Fortran or C/C++ ?
And Minuit2 it does not use FdF, but the Eval and the Gradient function separately. However it should be still much faster than the SteepestDescent.
If you find it slower, please post your full code that can be run and I could investigate it

Cheers

Lorenzo

system · May 14, 2022, 5:52pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.