I have an array of structures with four variables (x,y,z,v), every combination of these four variables define the center of a box and define two more values f(x,y,z,v) ->E_1 and E_2. I have a lot of boxes (combinations). I’m creating random numbers inside the boxes to fill all the space, and I need a 4d interpolation routine. I mean, for every point (x_i,y_i,z_i,v_i) I need to find out the value of the function in that point f(xi,yi,zi,vi)->E_1_i,E_2_i
For the moment, I’m associating to each box the values at the center of the box, so the function takes the same values in all the box. I know this is wrong and I was trying to improve it using a 4D interpolation.
When I did this in two dimensions I’ve created my interpolation routine, but now that the problem is more complex I was trying to find if there was something already available. I found the TMultiDimFit class, but I’m not sure if this is what I need or how to use it with my specific case.
Are your points distributed in a regular grid, so each box has the same volume or not ?
If they are regular it is easier to do it.
I would not use TultiDim fit but you can use several methods in TMVA to perform regression, such as an artificial neural network. I guess they will perform better.
It seems to me however that you have two target values, and in such case we don’t have any method available in ROOT. Maybe it exists something in R, that you can use via the ROOT-R interface
The boxes don’t have the same size. The step in x(+2), y(+2), and z(+20) is constant. v has not always the same step (most of the times is 20, but from time to time takes a different value). Also, the boxes in the edges are half the size that they will be if they were not at the edges. So, they are not regular.
I believe having to target values is not a problem because one does not influence the other. So I think this could be done in two steps like if every x,y,z,v provide one value first and then the other.
I have a few questions regarding how to implement this.
My data are in a txt which I read and storage into a structure. So, I have something like this info[i].x, info[i].y, info[i].z and info[i].v. These are the variables I have to add to factory, right? How should I include this? where I include the value of the function for the points included on the file.
I need to read the data from an ascii file, I had a look at TMVAClassification . I should include something like this, isn’t it?
I don’t need to include a background, so I guess I just ignore that part, right?
Do I need cuts?
Now is time to choose the method, to train, to test and evaluate. The output is storage in a root file then, there I guess you can see the full interpolation function. So, I guess I will have to evaluate that function in a point and then it will return a value.
First you should look at TMVARegression.C and not TMVAClassification.C Classification is to distinguish signal from background, while regression is to estimate a function value (your problem).
Yes, x, y, z and v are the variables you should give to the Factory and they are the corresponding branches in your input tree.
You can give as input a text file or you can always create a ROOT TTree from a text file too.
For the regression you don;t need the background, but you need to provide the target variables (E1 or E2 in your case). this is done using the AddTarget() method.
The output of the training phase is basically a function that can be evaluated for every point given by the user.
You can see how the evaluation works in the TMVARegressionApplication.C macro
[code] // Create a new root output file
TString outfileName( “TMVAReg.root” );
TFile* outputFile = TFile::Open( outfileName, “RECREATE” );
// Create the factory object. Later you can choose the methods
// whose performance you'd like to investigate. The factory will
// then run the performance analysis for you.
//
// The first argument is the base of the name of all the
// weightfiles in the directory weight/
//
// The second argument is the output file for the training results
// All TMVA output can be suppressed by removing the "!" (not) in
// front of the "Silent" argument in the option string
TMVA::Factory *factory = new TMVA::Factory( "TMVARegression", outputFile, "!V:!Silent:Color:DrawProgressBar:Transformations=I;D;P;G,D:AnalysisType=Classification" );
// Read training and test data (see TMVAClassification for reading ASCII files)
// load the signal and background event samples from ROOT trees
auto signalFile = TFile::Open("data.root","READ");
auto signalTree = static_cast<TTree*>(signalFile->Get("dist_tree"));
factory->AddSignalTree(signalTree,1.0);
// Define the input variables that shall be used for the MVA training
factory->AddVariable( "theta_p", "Polar angle proton", "degrees", 'D' );
factory->AddVariable( "theta_n", "Polar angle neutron", "degrees", 'D' );
factory->AddVariable( "delta_phi", "Difference in azimuthal angle", "degrees", 'D' );
factory->AddVariable( "S", "Kinematic variable", "degrees", 'D' );
// Add the variable carrying the regression target
factory->AddTarget( "E_p" );
std::cout << "--- TMVARegression : Using input file: " << signalFile->GetName() << std::endl;
// --- Register the regression tree
// TTree *regTree = (TTree*)input->Get("TreeR");
// global event weights per tree (see below for setting event-wise weights)
Double_t regWeight = 1.0;
// Apply additional cuts on the signal and background samples (can be different)
TCut mycut = ""; // for example: TCut mycut = "abs(var1)<0.5 && abs(var2-0.5)<1";
// tell the factory to use all remaining events in the trees after training for testing:
factory->PrepareTrainingAndTestTree( mycut, "nTrain_Regression=1000:nTest_Regression=0:SplitMode=Random:NormMode=NumEvents:!V" );
// factory->PrepareTrainingAndTestTree( mycut,
// "nTrain_Regression=0:nTest_Regression=0:SplitMode=Random:NormMode=NumEvents:!V" );
// If no numbers of events are given, half of the events in the tree are used
// for training, and the other half for testing:
// factory->PrepareTrainingAndTestTree( mycut, "SplitMode=random:!V" );
// ---- Book MVA methods
//
// please lookup the various method configuration options in the corresponding cxx files, eg:
// src/MethoCuts.cxx, etc, or here: http://tmva.sourceforge.net/optionRef.html
// it is possible to preset ranges in the option string in which the cut optimisation should be done:
// "...:CutRangeMin[2]=-1:CutRangeMax[2]=1"...", where [2] is the third input variable
// K-Nearest Neighbour classifier (KNN)
factory->BookMethod( TMVA::Types::kKNN, "KNN", "nkNN=20:ScaleFrac=0.8:SigmaFact=1.0:Kernel=Gaus:UseKernel=F:UseWeight=T:!Trim" );
// ---- Now you can tell the factory to train, test, and evaluate the MVAs
// Train MVAs using the set of training events
factory->TrainAllMethods();
// ---- Evaluate all MVAs using the set of test events
factory->TestAllMethods();
// ----- Evaluate and compare performance of all configured MVAs
factory->EvaluateAllMethods();
// --------------------------------------------------------------
// Save the output
outputFile->Close();
std::cout << "==> Wrote root file: " << outputFile->GetName() << std::endl;
std::cout << "==> TMVARegression is done!" << std::endl;
delete factory;
Everything seems to be working properly, but I’m not able to find the function which depends on my four variables. I run something like TMVARegression and TMVARegressionApplication adapted to my variables.
Any hint about what I’m doing wrong?
I tried to use as well the macros on page 32 of the manual I get tons of complaints about TMVAGlob.C, so I copied the needed parts of this macro inside the macros on page 32.