Full path to state file PyGTBModel_GTB.PyData is not propagated to the xml file with training weights

okarache · April 21, 2020, 1:01pm

Hello, could you help us to figure out how to force full path to the state file PyGTBModel_GTB.PyData, please?

We are using ROOT.TMVA.Types.kPyGTB for the training ant testing.
data loader is called dataset_pymva: TMVA.DataLoader(‘dataset_pymva’). When running the training I see the message
Saving state file: dataset_pymva/weights/PyGTBModel_GTB.PyData

weights folder was created automatically.

I also see these files created
: Creating xml weight file: dataset_pymva/weights/TMVAOutput_WPhi_2L1T_newPyPlotter2_d_GTB.weights.xml
: Creating standalone class: dataset_pymva/weights/TMVAOutput_WPhi_2L1T_newPyPlotter2_d_GTB.class.C

In the standalone code which has nothing to do with this training we need to apply the training to the data set. We only use dataset_pymva/weights/TMVAOutput_WPhi_2L1T_newPyPlotter2_d_GTB.weights.xml for this.
But here is the problem, in the .xml file there is a path to the .PyData file and path is not full, so the application of the training parameters crashes.
dataset_pymva/weights/PyGTBModel_GTB.PyData

If we modify the path to the .PyData file manually to give full path to the file, then throning is applied ok. How can I force the training to write full path to .PyData training, so that can optimize the application?

I tried to give full path in the data loader as TMVA.DataLoader(‘full_path/dataset_pymva’), but this crashed even before saving the .PyData file step with an error:

Error in TFile::GetObjectChecked: The provided key name is invalid.

*** Break *** segmentation violation

Many thanks in advance for your help,
Olena

moneta · April 22, 2020, 10:07am

Hi,

You have two options for setting the name of the output training model data.
The first one you can use the option FilenameClassifier when booking the PyGTB model, for example do:

factory->BookMethod(dataloader, TMVA::Types::kPyGTB, "PyGTB","!V:VarTransform=N:NEstimators=850:Loss=deviance:LearningRate=0.1:Subsample=1:MaxDepth\
=3:MaxFeatures='auto':FilenameClassifier=/tmp/tmva/model_pygtb.data" );

The second possibility is to set a different directory for all the outpu model weights (i.e. xml file).
This you can do (only when using 6.22, because there was a bug before), by setting this configuration option:

TMVA::gConfig().fIONames.fWeightFileDirPrefix="/tmp/";

This will result in having the weights and by default also the model output in the directory
/tmp/dataset_pymva/weights

Best regards

Lorenzo

okarache · April 22, 2020, 2:15pm

Thank you so much, Lorenzo! Perfect, we will choose of the paths proposed.

With best wishes,
Olena