Using root files with vector<float> branches as input into TMVA-keras

ktali · November 26, 2020, 9:14am

Hello experts, I hope you can help me pls

I am using the basic TMVA script (https://root.cern/doc/master/ClassificationKeras_8py.html) from the tutorial to test the NN training on my own root files.

The script runs well with the example, however when I use my own root file as input I get the following errors:

: Expression: MyBranch1 does not provide data for this event. This event is not taken into account. --> please check if you use as a variable an entry of an array which is not filled for some events (e.g. arr[4] when arr has only 3 elements).
: If you want to take the event into account you can do something like: “Alt$(arr[4],0)” where in cases where arr doesn’t have a 4th element, 0 is taken as an alternative.
Error in TTreeFormula::Compile: Bad numerical expression : “MyBranch1”
: Expression MyBranch1 could not be resolved to a valid formula.
***> abort program execution
Traceback (most recent call last):
File “./ClassificationKeras.py”, line 63, in
factory.TrainAllMethods()
Exception: void TMVA::TrainAllMethods() =>
FATAL error (C++ exception of type runtime_error)

This error Warning/Error is repeated for all my branches

I am clueless in these issues but I guess this is about the fact that my branches are of type vector while the root example file used in the tutorial has F type of branches?

I see that the error suggest assigning the value 0 in cases it is missing input, but I am not sure I would want that, and also I am not sure where in the script I should put it

If anyone has an idea it will be super helpful… preferably for dummies pls

Many thanks in advance

oshadura · November 26, 2020, 1:32pm

Could you please post a small reproducer here or maybe share a file?

@moneta sorry for pinging you but maybe you will be able to take a look? Thanks!

moneta · November 26, 2020, 2:24pm

Hi,

If you have a vector you can provide as input to TMVA each single vector element, (supposing MyBranch is the name of the std::vector branch) using the function data loader.AddVariable("MyBranch[0]") and similar for "MyBranch[1]","MyBranch[2]".
Otherwise you have also the possibillity to provide as input the entire vector, by using
data loader.AddVariableArray("MyBranch", n)
where n its the size of the vector.

This applies if the vector has the size for each event. If this is not the case you would need to add zero values for the missing elements

Lorenzo

ktali · December 2, 2020, 9:51am

Many thanks both oshadura and Lorenzo! And apologies for the delay. I was hoping to finish implementing Lorenzo’s solution and testing this before replying but I got a bit confused on something very silly probably.

The Classification script in https://root.cern/doc/master/ClassificationKeras_8py.html seems to be fixed on 4 flat variables. So I now noticed it complains first that I use vectors and second that I have more than 4 in my tree. So if for example I created a flat root file with 5 variables it complains:
Exception: Error when checking model input: expected dense_input_2 to have shape (None, 4) but got array with shape (200, 5)

I see that this script calls header files (import ROOT) where it is defined in the data loaders the variables.
Lorenzo, if I understand you correctly I should add these lines in the .py file, but then how do I ran-over the definitions in the header that calls these 4 variables of the tutorial root file?

The solution seems to me to edit all headers, but in case there is a shortcut, it will be good to know.
Many thanks again and apologies for the cluelessness

moneta · December 8, 2020, 4:10pm

Hi,

If you change the number of inputs (following the tutorial https://root.cern/doc/master/ClassificationKeras_8py.html , you need to change the input shape of the Keras model. For example change in this line from 4 to 5 (if you have 5 input variables):

model.add(Dense(64, activation='relu', W_regularizer=l2(1e-5), input_dim=4))

Lorenzo

ktali · December 15, 2020, 1:22pm

Many thanks Lorenzo

I tried, yet it still failed … Eventually the problem was that I mistakenly used an old setup from the old IML tutorial example, i.e.:
/cvmfs/sft.cern.ch/lcg/views/LCG_88/x86_64-slc6-gcc49-opt/setup.sh
instead of:
/cvmfs/sft.cern.ch/lcg/views/LCG_97/x86_64-centos7-gcc9-opt/setup.sh

Now it seems to run
Many thanks for all your help

ktali · April 9, 2024, 8:23am

Hello,

Apologies for bringing back to life this old thread, but it became relevant again

I am trying again to use root samples that have a tree of vector branches and perform classification and then apply the application on them, using the Keras TMVA.

While the classification seem to run without any problems, the code crashes when I apply the application, specifically in the part of the code:

kerasEval = array('f', [0])
kerasBranch = new_event_tree.Branch("KerasEvalOutput", kerasEval, "KerasEvalOutput/F")

for entry in simple_tree:
     eval_val = reader.EvaluateMVA("PyKeras")
     kerasEval[0] = eval_val
     kerasBranch.Fill()

I get the error:

Traceback (most recent call last):
  File "ApplicationTest.py", line 57, in <module>
    for entry in simple_tree:
  File "/cvmfs/sft.cern.ch/lcg/views/LCG_97/x86_64-centos7-gcc9-opt/lib/ROOT.py", line 232, in _TTree__iter__
    bytes_read = self.GetEntry(i)
SystemError: int TTree::GetEntry(Long64_t entry = 0, int getall = 0) =>
    problem in C++; program state has been reset

and my suspicion is that I didn’t handle well the input features (i.e declaring the vector branches) in both the classification and the application scripts.

I went back to this post, and tried the proposed solution:
dataloader.AddVariableArray("MyBranch", n)

But I get an error:
AttributeError: 'DataLoader' object has no attribute 'AddVariableArray'

Concerning the first solution proposed:
dataloader.AddVariable("MyBranch[0]")
I am not sure how to iterate over the branch vector, and the following attempt fails:

for variable in input_variables:
    variable_size = len(variable)
    for entry in variable_size:
        dataloader.AddVariable(variable[entry])

I am attaching for convenience a simplified versions of the python scripts (simple naming convention), hoping you can pls advise
Many thanks in advance!

ClassificationTest.py (1.8 KB)
ApplicationTest.py (1.6 KB)