TMVA read image data for application of a CNN model

lilina · March 22, 2023, 7:15pm

Hi!

I am starting to learn how to use TMVA. The input I need to use in my analysis is an image so I need to work with CNN. I have run the example on the TMVA tutorial TMVA_CNN_Classification.C. In this CNN example the input data is of type:

   std::vector<float> *px1 = &x1;
   std::vector<float> *px2 = &x2;
   bkg.Branch("vars", "std::vector<float>", &px1);
   sgn.Branch("vars", "std::vector<float>", &px2);

I want to run an application using this CNN example model with the same input data for learning how to perform an application with a CNN model. But I don’t know how to read the image data (std::vector *px1) on the application step, since there is no AddVariablesArray() on TMVA::Reader.
I also tried using TMVA::Experimental::RReader following this example ROOT: tutorials/tmva/tmva003_RReader.C File Reference as follows:

using namespace TMVA::Experimental;
void TMVA_CNN_ClassificationApplication()
{
   const std::string filename = "images_data_16x16.root";

   // Next, we load the model from the TMVA XML file.
   RReader model("dataset/weights/TMVA_CNN_Classification_TMVA_CNN_GPU.weights.xml");
.
   auto variables = model.GetVariableNames();
   cout<<"Variables names: "<<variables[0]<<endl; // This is not giving the right name, just prints "[0]"

   auto make_histo = [&](const std::string &treename) {
      ROOT::RDataFrame df(treename, filename);
      auto df2 = df.Define("y", Compute<1, float>(model), {"vars"});
      return df2.Histo1D({treename.c_str(), ";CNN score;N_{Events}", 100, -0.1, 1.1}, "y");
   };
   auto sig = make_histo("sig_tree");
   auto bkg = make_histo("bkg_tree");
}

But I get the following error:
root [0]
Processing TMVA_CNN_ClassificationApplication.C…
Variables names: [0]
Error in TTreeReaderValueBase::CreateProxy(): The branch vars contains data of type vector. It cannot be accessed by a TTreeReaderValue
terminate called after throwing an instance of ‘std::runtime_error’
what(): An error was encountered while processing the data. TTreeReader status code is: 6

If I change Compute<1, float>(model) by Compute<1, vector<float>>(model) it compiles but just returns ‘0’.

I appreciate any guidance on how to make an Application from a CNN model that used for training data trees with a single branch of a std::vector of size nh x nw containing the image data.

Thanks a lot!

couet · March 23, 2023, 10:04am

I guess @moneta can help.

lilina · March 24, 2023, 10:16pm

Thanks, @couet and @moneta. I have modified my post to give more details about the problem I am facing. Many thanks for your help.

moneta · March 27, 2023, 9:15am

Hi,
For the RReader class one needs to modify the Compute functor passed to RDataFrame to handle vector data. Attached you find the example code, defining a new functor to use with RRdataFrame.

I also attach the example code for using the Reader class in this case. Since AddVariableArray is missing one needs to declare each single vector element, as shown in the attached code. I will add this missing function in the next release.

Best Regards

Lorenzo

TMVA_CNN_RReader.C (908 Bytes)
TMVA_CNN_ClassificationApplication.C (8.1 KB)

lilina · March 27, 2023, 8:48pm

Dear Lorenzo,

Many thanks for your reply and the examples on how to read the image data on the CNN application step. I ran the tutorial TMVA_CNN_Classification.C on the GPU and then use the same signal tree to perform the application following the examples you provided. But the output of the CNN response is very different from the one obtained at the CNN classification step. The BDT and DNN_GPU response do agree with their distributions at classification but I cannot get a similar behavior for the CNN_GPU even if I use the same signal tree used for training. This happens with both Reader and RReader.
I am using ROOT 6.27/01 and I have added the plots and the code for reference. Can you please help me understand what is happening? I appreciate your help very much!

From Classification

From Application using the same signal tree used at training.

Note: Repeating this analysis regenerating the data and redoing the training sometimes gives a single peak at 1 and other times at 0 or in both values as in the case shown in the image.
TMVA_CNN_RReader.C (878 Bytes)
TMVA_CNN_ClassificationApplication.C (8.2 KB)
TMVA_CNN_Classification.C (17.7 KB)

moneta · March 28, 2023, 8:57am

This is strange, can you please attach also the saved xml files

Thanks

Lorenzo

lilina · March 28, 2023, 1:16pm

Dear Lorenzo,

Thank you so much for looking into this. Here are the files. The weights in the file dataset gave me a peak at 0 and 1 as in the above image for the CNN_GPU. Reruning the analysis also gives sometimes just one peak at 0 (dataset_0) and other times one peak at 1 (dataset_1) for the CNN_GPU.

dataset.zip (1.4 MB)
dataset_1.zip (1.5 MB)
dataset_0.zip (1.4 MB)

Lilina

moneta · March 28, 2023, 1:25pm

Thank you for the files, I will look at them later today.
If I have understood well when running the training, you get the nice plot above using the TMVA GUI, correct ?

Lorenzo

lilina · March 28, 2023, 11:46pm

Thanks a lot! Yes the first plot that shows both signal and background is from the TMVA GUI obtained at training with TMVA_CNN_Classification.C. The second one is obtained at the application stage with either Reader or RReader using the same signal image data tree used for training.

Lilina

moneta · March 29, 2023, 6:52am

Hello Lilina,
Which input data are you using ? The same used by the TMVA tutorial that is present in the ROOT repository?

Lorenzo

lilina · March 29, 2023, 12:04pm

I am using the data created by MakeImagesTree(5000, 16, 16) function of the TMVA_CNN_Classification.C.
I put the data created with this function on this drive:
https://drive.google.com/drive/folders/1NConjXFlacXWIrX9h16cy9Gi-7Nlw_fj?usp=sharing

Training on images_data_16x16.root gave the weights in the folder dataset
Training on images_data_16x16_0.root gave the weights in the folder dataset_0
Training on images_data_16x16_1.root gave the weights in the folder dataset_1

Thank you so much!

Lilina

moneta · March 29, 2023, 3:27pm

Hello Lilina,

Thanks for sharing the file, I can reproduce your problem in the Reader when reading your XML weight files, but I could not when running the training myself on the data you have provided.
I suspect in some cases the CNN did not train well. Can you please share for one of this cases (dataset_0 or dataset_1) the full log printout you have during training and also the TMVA output root file that is used by the TMVAGUI (it should be called TMVA_CNN_ClassificationOutput.root) ?
The Training history is available using the TMVAGUI (item(8) ). It should show if the validation error has correctly decreased to a reasonable value.

Lorenzo

lilina · March 29, 2023, 5:53pm

Dear Lorenzo,

Many thanks for your help. Here is the information for dataset_1:

TMVA output root file
https://drive.google.com/drive/folders/1NConjXFlacXWIrX9h16cy9Gi-7Nlw_fj?usp=sharing
Training history TMVAGUI (item(8))
Training full log printout
training_printout.txt (46.9 KB)

moneta · March 30, 2023, 10:19am

Hello Lilian,

Thank you very much for all the files and information. Looking at the training results everything looks good. The training is fine.
I can reproduce your problem when running the training , only using your input data, on the GPU. Running on a CPU works fine. I will need some time to investigate further this problem to find the cause. For the time being, I would suggest you to use as a workaround the CPU architecture. You need to change in TMVA_CNN_Classification.C, around line 320 to have as ddnOptions "Architecture=CPU".

Lorenzo

lilina · March 31, 2023, 2:51am

Hi Lorenzo,

Thanks a lot for investigating this problem, I appreciate it.

Lilina

moneta · March 31, 2023, 8:24am

Thanks to you for reporting this problem. I have found what is causing it, it is a bug in the Batch Normalisation layer that is not evaluated correctly on CPU when it has ben trained on GPU. I will provide a fix in the next days. In the mean time a issue is open, see [tmva] Wrong result when evaluating a TMVA CNN a GPU trained model with BNORM layer on CPU · Issue #12589 · root-project/root · GitHub

Best regards,

Lorenzo