I am starting to learn how to use TMVA. The input I need to use in my analysis is an image so I need to work with CNN. I have run the example on the TMVA tutorial TMVA_CNN_Classification.C. In this CNN example the input data is of type:
I want to run an application using this CNN example model with the same input data for learning how to perform an application with a CNN model. But I don’t know how to read the image data (std::vector *px1) on the application step, since there is no AddVariablesArray() on TMVA::Reader.
I also tried using TMVA::Experimental::RReader following this example ROOT: tutorials/tmva/tmva003_RReader.C File Reference as follows:
using namespace TMVA::Experimental;
void TMVA_CNN_ClassificationApplication()
{
const std::string filename = "images_data_16x16.root";
// Next, we load the model from the TMVA XML file.
RReader model("dataset/weights/TMVA_CNN_Classification_TMVA_CNN_GPU.weights.xml");
.
auto variables = model.GetVariableNames();
cout<<"Variables names: "<<variables[0]<<endl; // This is not giving the right name, just prints "[0]"
auto make_histo = [&](const std::string &treename) {
ROOT::RDataFrame df(treename, filename);
auto df2 = df.Define("y", Compute<1, float>(model), {"vars"});
return df2.Histo1D({treename.c_str(), ";CNN score;N_{Events}", 100, -0.1, 1.1}, "y");
};
auto sig = make_histo("sig_tree");
auto bkg = make_histo("bkg_tree");
}
But I get the following error:
root [0]
Processing TMVA_CNN_ClassificationApplication.C…
Variables names: [0]
Error in TTreeReaderValueBase::CreateProxy(): The branch vars contains data of type vector. It cannot be accessed by a TTreeReaderValue
terminate called after throwing an instance of ‘std::runtime_error’
what(): An error was encountered while processing the data. TTreeReader status code is: 6
If I change Compute<1, float>(model) by Compute<1, vector<float>>(model) it compiles but just returns ‘0’.
I appreciate any guidance on how to make an Application from a CNN model that used for training data trees with a single branch of a std::vector of size nh x nw containing the image data.
Hi,
For the RReader class one needs to modify the Compute functor passed to RDataFrame to handle vector data. Attached you find the example code, defining a new functor to use with RRdataFrame.
I also attach the example code for using the Reader class in this case. Since AddVariableArray is missing one needs to declare each single vector element, as shown in the attached code. I will add this missing function in the next release.
Many thanks for your reply and the examples on how to read the image data on the CNN application step. I ran the tutorial TMVA_CNN_Classification.C on the GPU and then use the same signal tree to perform the application following the examples you provided. But the output of the CNN response is very different from the one obtained at the CNN classification step. The BDT and DNN_GPU response do agree with their distributions at classification but I cannot get a similar behavior for the CNN_GPU even if I use the same signal tree used for training. This happens with both Reader and RReader.
I am using ROOT 6.27/01 and I have added the plots and the code for reference. Can you please help me understand what is happening? I appreciate your help very much!
From Classification
From Application using the same signal tree used at training.
Thank you so much for looking into this. Here are the files. The weights in the file dataset gave me a peak at 0 and 1 as in the above image for the CNN_GPU. Reruning the analysis also gives sometimes just one peak at 0 (dataset_0) and other times one peak at 1 (dataset_1) for the CNN_GPU.
Thank you for the files, I will look at them later today.
If I have understood well when running the training, you get the nice plot above using the TMVA GUI, correct ?
Thanks a lot! Yes the first plot that shows both signal and background is from the TMVA GUI obtained at training with TMVA_CNN_Classification.C. The second one is obtained at the application stage with either Reader or RReader using the same signal image data tree used for training.
Training on images_data_16x16.root gave the weights in the folder dataset
Training on images_data_16x16_0.root gave the weights in the folder dataset_0
Training on images_data_16x16_1.root gave the weights in the folder dataset_1
Thanks for sharing the file, I can reproduce your problem in the Reader when reading your XML weight files, but I could not when running the training myself on the data you have provided.
I suspect in some cases the CNN did not train well. Can you please share for one of this cases (dataset_0 or dataset_1) the full log printout you have during training and also the TMVA output root file that is used by the TMVAGUI (it should be called TMVA_CNN_ClassificationOutput.root) ?
The Training history is available using the TMVAGUI (item(8) ). It should show if the validation error has correctly decreased to a reasonable value.
Thank you very much for all the files and information. Looking at the training results everything looks good. The training is fine.
I can reproduce your problem when running the training , only using your input data, on the GPU. Running on a CPU works fine. I will need some time to investigate further this problem to find the cause. For the time being, I would suggest you to use as a workaround the CPU architecture. You need to change in TMVA_CNN_Classification.C, around line 320 to have as ddnOptions "Architecture=CPU".