Need help using TMultiDimFit

Hello,

I tried to make TMultiDimFit to work for me and got an absolutely unsuccessful experience :cry:
I.e. I read the class description and tried to play around with the example
($ROOTSYS/tutorials/multidimfit.C). I change to example to approximate a two dimensional
function f(x,y) = x*y and the class interpolates the function absolutely incorrectly. I tried to
change some algorithm parameters with no positive result.

I really want to try to use the class to develop a new tracking for my analysis :slight_smile:

The second thing that is suspicious is that I always (even with the standard example) get
the following warnings:

Warning in <TMatrixD::Invert(Double_t *)>: Determinant under/over-flows double: det= 0.8784 2^288
Warning in TMultiDimFit::MakeCoefficientErrors: curvature matrix is singular

Here is my modified standard example:

#include "Riostream.h"
#include "TROOT.h"
#include "TApplication.h"
#include "TCanvas.h"
#include "TH1.h"
#include "TSystem.h"
#include "TBrowser.h"
#include "TFile.h"
#include "TRandom.h"
#include "TMultiDimFit.h"

//____________________________________________________________________
void makeData(Double_t* x, Double_t& d, Double_t& e) 
{
  // Make data points 
  Double_t upp[2] = { 10.0, 10.0 };
  Double_t low[2] = {  0.0,  0.0 };

  for (int i = 0; i < 2; i++) 
    x[i] = low[i] + (upp[i] - low[i]) * gRandom->Rndm(); 
  
  d = x[0]*x[1];
  //d = x[0];
  //d = x[1];
  //d = 3.0;
  //d = x[0]+x[1];

  //cout<<"x[0]="<<x[0]<<" x[1]="<<x[1]<<" d="<<d<<endl;
  
  e = gRandom->Gaus(1.0, 1.0);
}

//____________________________________________________________________
Int_t MultDimFit() 
{

  cout << "*************************************************" << endl; 
  cout << "*             Multidimensional Fit              *" << endl;
  cout << "*                                               *" << endl;
  cout << "* By Christian Holm <cholm@nbi.dk> 14/10/00     *" << endl;
  cout << "*************************************************" << endl; 
  cout << endl;

  // Initialize global TRasnom object. 
  gRandom = new TRandom();

  // Open output file 
  TFile* output = new TFile("mdf.root", "RECREATE");

  // Global data parameters 
  Int_t nVars       = 2;
  Int_t nData       = 1000;

  // make fit object and set parameters on it. 
  TMultiDimFit* fit = new TMultiDimFit(nVars, TMultiDimFit::kMonomials,"v");

  Int_t mPowers[]   = { 5 , 5 };
  fit->SetMaxPowers(mPowers);
  fit->SetMaxFunctions(1000);
  fit->SetMaxStudy(1000);
  fit->SetMaxTerms(1000);
  fit->SetPowerLimit(1);
  fit->SetMinAngle(10);
  fit->SetMaxAngle(10);
  fit->SetMinRelativeError(0.1);

  // variables to hold the temporary input data 
  Double_t d;
  Double_t e;
  Double_t *x = new Double_t[nVars];
  
  // Print out the start parameters
  fit->Print("p");

  // Create training sample 
  Int_t i;
  for (i = 0; i < nData ; i++) {

    // Make some data 
    makeData(x,d,e);

    // Add the row to the fit object
    fit->AddRow(x,d,e);
  }

  // Print out the statistics
  fit->Print("s");

  // Book histograms 
  fit->MakeHistograms();

  // Find the parameterization 
  fit->FindParameterization();

  // Print coefficents 
  fit->Print("rc");

  // Get the min and max of variables from the training sample, used
  // for cuts in test sample. 
  Double_t *xMax = new Double_t[nVars];
  Double_t *xMin = new Double_t[nVars];
  for (i = 0; i < nVars; i++) {
    xMax[i] = (*fit->GetMaxVariables())(i);
    xMin[i] = (*fit->GetMinVariables())(i);
  }

  nData = fit->GetNCoefficients() * 100;
  Int_t j;

  // Create test sample 
  for (i = 0; i < nData ; i++) {
    // Make some data 
    makeData(x,d,e);

    for (j = 0; j < nVars; j++) 
      if (x[j] < xMin[j] || x[j] > xMax[j])
	break;

    // If we get through the loop above, all variables are in range 
    if (j == nVars)  
      // Add the row to the fit object
      fit->AddTestRow(x,d,e);
    else
      i--;
  }
  delete gRandom;

  // Test the parameterizatio and coefficents using the test sample. 
  fit->Fit();

  // Print result 
  fit->Print("fc");

  // Write code to file 
  fit->MakeCode();

  // Write histograms to disk, and close file 
  output->Write();
  output->Close();
  delete output;

  // We're done 
  delete fit;
  cout << "The END" << endl;

  return 0;
}

Here is my drawing script:

#include "MDF.C"

/**************************************************************************/
double MDFfun(Double_t *x, Double_t *par)
{

 return MDF(x);
 //return x[0]*x[1];

}// end of function
/**************************************************************************/
void MultDimFitDraw(void)
{

 TF2* zzz = new TF2("zzz", MDFfun, 0.0, +10.0, 0.0, 10.0, 0);
 zzz->Draw("surf1");

} // end of function
/**************************************************************************/

Sincerely yours,
Sergei.

Sergei,

Use the version from CVS. Two unfortunate problems in TMultiDimFit
were fixed two weeks ago. I tested that your code executes correctly
with this version.

Rene

Dear Rene,

I do not have at the moment a short example script reproducing the problem
but the class crashes quite often during finding of parameterization. May
be I did not update something important???

To make the things running I took ROOT 4.00/08 and did

cvs update hist/inc/TMultiDimFit.h hist/src/TMultiDimFit.cxx
cvs update matrix

with

$CVSROOT = :pserver:cvs@root.cern.ch:/user/cvs

Do I need to update anything else?

Update: The algorithm crashes (*** Break *** segmentation violation) even if
I use the bleeding edge ROOT code.

Thanks!!! :slight_smile:

Sergei.[/b]

Dear Rene,

The last year I also wrote for my needs a class performing an
interpolation of an N-dimensional function with Chebyshev polynomials. I
am not sure if it could be used in ROOT because
[ul][li] it uses STL[/li]
[li] it is a template of a floating point type used (i.e. float, double,
long double), but this can be removed[/li]
[li] it uses an other template[/li][/ul]
But
[ul] [li] it works perfectly in compiled ROOT macros[/li]
[li] it works in usual C++ programs[/li][/ul]
The other advantages over TMultiDimFit are:
[ul] [li] it converges much faster[/li]
[li] the interface is much more clear and easy[/li]
[li] it allows to save the interpolation into an ASCII file and
initialize it from there without having to recompile the program[/li]
[li] it allows distributed interpolation building which is really
important for difficult functions or many dimensions[/li][/ul]
The disadvantage is that it is not a fit so the algorithm works well if
the required interpolation precision is significantly smaller than the
error of function values.

Rene, if it sounds interesting, please let me know :slight_smile:

Sergei.

Hi Sergei,

Could you be a bit more clear about your algorithm to adjust
these orthogonal functions to an arbitrary function . You seem
to say that it is not a fit ?

Your algorithm might be interesting to add to TFormula as
a way to interpolate functions .

Concerning TMultiDimFit, I agree that the interface is far from
optimal but I guess that the author tried to keep the feel of the
original Fortran code .

Note that Anna Kreshuk recently introduced a new class for fitting
linear function TLinearFitter . I would be worth persuing this path
further to replace TMultiDimFit . It seems that one would have to add
a kind of “F-test” to determine which additional terms are significant .

Do I understand fromyour mail that your example does NOT work
with the CVS head while it is ok for Rene ?

Eddy

[quote]I am not sure if it could be used in ROOT because

* it uses STL
* it is a template of a floating point type used (i.e. float, double,
  long double), but this can be removed
* it uses an other template [/quote]

None of these issues are preventing inclusion of your code in ROOT. Our current code conventions do allow for the use of STL containers and templated classes.

Cheers,
Philippe.

Dear Eddy,

[quote]Could you be a bit more clear about your algorithm to adjust
these orthogonal functions to an arbitrary function . You seem
to say that it is not a fit ? [/quote]

Yes, it is not a fit :slight_smile: To build the interpolation I use Chebyshev
polynomials. Those are the optimal choice to approximate a function
on an interval. The grid on which the interpolation is built is chosen to
be optimal too, i.e. it coincides with the roots of the Chebyshev
polynomial on the next order. The factors are calculated analytically
using formulas instead of finding them using some minimization, so
that is why I said that the algorithm is NOT a fit.

I use a different approach to avoid too much terms to be calculated
to reach the necessary precision, i.e. instead of removing insignificant
terms I allow the algorithm to self-adjust and to use more information
about the function behavior in “difficult” regions while in “easy” regions
the function investigation is terminated quickly. To reach the goal I
first build the interpolation on the grid. Then I choose some number of
test points and compare the approximation values with exact those.
If the difference in at least one test point exceeds the required
precision the whole domain is subdivided into parts and “daughter”
interpolations are built in them. So more subdivisions are performed
in difficult behavior function regions and the process stops fast
if the function is smooth somewhere. I tested the algorithm on very
crazy functions. It works 8) So the result is data set of different size
domain interpolations through which one can navigate very fast to find the
daughter approximation corresponding to the point in which one wants to
estimate the function.

All these can be saved into a file and read from it.

[quote]Do I understand from your mail that your example does NOT work
with the CVS head while it is OK for Rene ? [/quote]

No, Eddy, my example works with the CVS version but does not work
at all in older versions of ROOT. Some bugs were corrected in the class
code recently.

Sergei.

Dear Philippe,

[quote]None of these issues are preventing inclusion of your code in ROOT. Our current code conventions do allow for the use of STL containers and templated classes.
[/quote]

That sounds just great :slight_smile: If the ROOT team decides that my code could be useful to be included in ROOT I will be glad to provide it!

Sergei.

Sergei,

If you send code, make sure it runs with the CVS head.
I also assume that your code is documented and follow our coding guidlines. If it does, we will be happy to include it in the system.

Note that there are currently discussions aiming at a restructure of TMultiLayerPerceptron and TMultiDimFit (between Axel Naumann, Christophe Delaereand Jan Conrad). They will probably contact you if if is necessary.

Rene

Dear Rene,

Have you received this e-mail?

Date: Tue, 29 Mar 2005 03:49:41 -0500 To: Rene BRUN <rene.brun@cern.ch> Cc: Rene Brun <brun@pcbrun.cern.ch>, rootdev@pcroot.cern.ch Subject: New ROOT code: Chebyshev polynomials interpolation and N-dim matrix. Part(s): ChebyshevApproximation.zip application/zip 17.32 KB

I have not heard from anybody so I am just not sure if the e-mail was sent correctly.

Sergei.

sergei,

I received your mail and I hope to have time to process it in the coming days. This week was nearly impossible with too many meetings ::slight_smile:

Rene

Dear ROOT developers,

May be I am hallucinating but my experience seems to tell me
that in the CVS version of ROOT TMultiDimFit is going to
crash program in both cases

[ul]
[li] too little samples statistics (“Abort”)[/li]
[li] too much statistics (memory leak and ROOT gets killed by
operation system)[/li][/ul]

The difference between the two cases seems to be very small
(just a couple of percent). It takes some time to guess
how many samples one have to use.

:open_mouth:

Sergei.

It could be that you are hallucinating ::slight_smile:
If not, send a concrete example that we can test or better discuss directly
with author Christian Holm Christensen at cholm@nbi.dk

Rene