4D interpolation

Hi,

I have an array of structures with four variables (x,y,z,v), every combination of these four variables define the center of a box and define two more values f(x,y,z,v) ->E_1 and E_2. I have a lot of boxes (combinations). I’m creating random numbers inside the boxes to fill all the space, and I need a 4d interpolation routine. I mean, for every point (x_i,y_i,z_i,v_i) I need to find out the value of the function in that point f(xi,yi,zi,vi)->E_1_i,E_2_i
For the moment, I’m associating to each box the values at the center of the box, so the function takes the same values in all the box. I know this is wrong and I was trying to improve it using a 4D interpolation.

When I did this in two dimensions I’ve created my interpolation routine, but now that the problem is more complex I was trying to find if there was something already available. I found the TMultiDimFit class, but I’m not sure if this is what I need or how to use it with my specific case.

If you need more details, I can provide them.

Thanks

Hi,

Are your points distributed in a regular grid, so each box has the same volume or not ?
If they are regular it is easier to do it.
I would not use TultiDim fit but you can use several methods in TMVA to perform regression, such as an artificial neural network. I guess they will perform better.

It seems to me however that you have two target values, and in such case we don’t have any method available in ROOT. Maybe it exists something in R, that you can use via the ROOT-R interface

Best Regards

Lorenzo

The boxes don’t have the same size. The step in x(+2), y(+2), and z(+20) is constant. v has not always the same step (most of the times is 20, but from time to time takes a different value). Also, the boxes in the edges are half the size that they will be if they were not at the edges. So, they are not regular.

I believe having to target values is not a problem because one does not influence the other. So I think this could be done in two steps like if every x,y,z,v provide one value first and then the other.

Hi,

If the two target are independent, then you can use any of the regression method of TMVA. See the tutorial example tutorials/tmva/TMVARegression.C

Lorenzo

Ok, I will look into that. I will let you know how it goes.

Thanks!!

Hi again!

I have a few questions regarding how to implement this.

  1. My data are in a txt which I read and storage into a structure. So, I have something like this info[i].x, info[i].y, info[i].z and info[i].v. These are the variables I have to add to factory, right? How should I include this? where I include the value of the function for the points included on the file.

  2. I need to read the data from an ascii file, I had a look at TMVAClassification . I should include something like this, isn’t it?

    TString datFileS = “tmva_example_sig.dat”;
    factory->SetInputTrees( datFileS );

  3. I don’t need to include a background, so I guess I just ignore that part, right?

  4. Do I need cuts?

  5. Now is time to choose the method, to train, to test and evaluate. The output is storage in a root file then, there I guess you can see the full interpolation function. So, I guess I will have to evaluate that function in a point and then it will return a value.

Thank you so much

Hi again,

First you should look at TMVARegression.C and not TMVAClassification.C Classification is to distinguish signal from background, while regression is to estimate a function value (your problem).

Yes, x, y, z and v are the variables you should give to the Factory and they are the corresponding branches in your input tree.
You can give as input a text file or you can always create a ROOT TTree from a text file too.

For the regression you don;t need the background, but you need to provide the target variables (E1 or E2 in your case). this is done using the AddTarget() method.

The output of the training phase is basically a function that can be evaluated for every point given by the user.
You can see how the evaluation works in the TMVARegressionApplication.C macro

Lorenzo

I did what you suggested and now the macro is running but I have an error which says:

Processing tmva.C... --- Factory : You are running ROOT Version: 6.04/02, Jul 14, 2015 --- Factory : --- Factory : _/_/_/_/_/ _| _| _| _| _|_| --- Factory : _/ _|_| _|_| _| _| _| _| --- Factory : _/ _| _| _| _| _| _|_|_|_| --- Factory : _/ _| _| _| _| _| _| --- Factory : _/ _| _| _| _| _| --- Factory : --- Factory : ___________TMVA Version 4.2.1, Feb 5, 2015 --- Factory : --- DataSetInfo : Added class "Signal" with internal class number 0 --- Factory : Add Tree dist_tree of type Signal with 692686 events --- TMVARegression : Using input file: data.root --- DataSetInfo : Class index : 0 name : Signal --- Factory : Booking method: KNN --- <WARNING> Factory : Method KNN is not capable of handling classification with 1 classes. --- <FATAL> Factory : You want to do classification training, but specified less than two classes. ***> abort program execution

I guess I’m missing something in my macro. Any hint?

Thanks!

Hi,

It looks to me you are still configuring the Factory for Classification instead of Regression. Can you please upload your macro

Thank you

Lorenzo

[code] // Create a new root output file
TString outfileName( “TMVAReg.root” );
TFile* outputFile = TFile::Open( outfileName, “RECREATE” );

	// Create the factory object. Later you can choose the methods
	// whose performance you'd like to investigate. The factory will
// then run the performance analysis for you.
//
 	// The first argument is the base of the name of all the
 	// weightfiles in the directory weight/ 
 	//
 	// The second argument is the output file for the training results
 	// All TMVA output can be suppressed by removing the "!" (not) in 
 	// front of the "Silent" argument in the option string

  	TMVA::Factory *factory = new TMVA::Factory( "TMVARegression", outputFile, "!V:!Silent:Color:DrawProgressBar:Transformations=I;D;P;G,D:AnalysisType=Classification" );

  	// Read training and test data (see TMVAClassification for reading ASCII files)
  	// load the signal and background event samples from ROOT trees
auto signalFile = TFile::Open("data.root","READ");
auto signalTree = static_cast<TTree*>(signalFile->Get("dist_tree"));

factory->AddSignalTree(signalTree,1.0);

// Define the input variables that shall be used for the MVA training
factory->AddVariable( "theta_p", "Polar angle proton", "degrees", 'D' );
factory->AddVariable( "theta_n", "Polar angle neutron", "degrees", 'D' );
factory->AddVariable( "delta_phi", "Difference in azimuthal angle", "degrees", 'D' );
factory->AddVariable( "S", "Kinematic variable", "degrees", 'D' );

// Add the variable carrying the regression target
factory->AddTarget( "E_p" ); 


  	std::cout << "--- TMVARegression           : Using input file: " << signalFile->GetName() << std::endl;

  	// --- Register the regression tree
  //	TTree *regTree = (TTree*)input->Get("TreeR");

// global event weights per tree (see below for setting event-wise weights)
 	Double_t regWeight  = 1.0;

// Apply additional cuts on the signal and background samples (can be different)
TCut mycut = ""; // for example: TCut mycut = "abs(var1)<0.5 && abs(var2-0.5)<1";

// tell the factory to use all remaining events in the trees after training for testing:
  	factory->PrepareTrainingAndTestTree( mycut, "nTrain_Regression=1000:nTest_Regression=0:SplitMode=Random:NormMode=NumEvents:!V" );
  	// factory->PrepareTrainingAndTestTree( mycut, 
  	//                                      "nTrain_Regression=0:nTest_Regression=0:SplitMode=Random:NormMode=NumEvents:!V" );

  	// If no numbers of events are given, half of the events in the tree are used 
  	// for training, and the other half for testing:
  	//    factory->PrepareTrainingAndTestTree( mycut, "SplitMode=random:!V" );  
    
	// ---- Book MVA methods
  	//
  	// please lookup the various method configuration options in the corresponding cxx files, eg:
  	// src/MethoCuts.cxx, etc, or here: http://tmva.sourceforge.net/optionRef.html
  	// it is possible to preset ranges in the option string in which the cut optimisation should be done:
  	// "...:CutRangeMin[2]=-1:CutRangeMax[2]=1"...", where [2] is the third input variable

  	// K-Nearest Neighbour classifier (KNN)

factory->BookMethod( TMVA::Types::kKNN, "KNN", "nkNN=20:ScaleFrac=0.8:SigmaFact=1.0:Kernel=Gaus:UseKernel=F:UseWeight=T:!Trim" );
  
// ---- Now you can tell the factory to train, test, and evaluate the MVAs

  	// Train MVAs using the set of training events
  	factory->TrainAllMethods();

  	// ---- Evaluate all MVAs using the set of test events
  	factory->TestAllMethods();

  	// ----- Evaluate and compare performance of all configured MVAs
  	factory->EvaluateAllMethods();    

  	// --------------------------------------------------------------
  
  	// Save the output
  	outputFile->Close();

  	std::cout << "==> Wrote root file: " << outputFile->GetName() << std::endl;
  	std::cout << "==> TMVARegression is done!" << std::endl;      

  	delete factory;

[/code]

Hi,

You should use AddRegressionTree and not AddSignalTree. Please follow the TMVARegression macro and not TMVAClassification

Lorenzo

Working now!! Thank you so much!! Now, I have to study the output and the different methods, but it is running!

Hi again!

Everything seems to be working properly, but I’m not able to find the function which depends on my four variables. I run something like TMVARegression and TMVARegressionApplication adapted to my variables.

Any hint about what I’m doing wrong?

I tried to use as well the macros on page 32 of the manual I get tons of complaints about TMVAGlob.C, so I copied the needed parts of this macro inside the macros on page 32.

Regards,

Paloma