TMVA Doesn't Work With TensorFlow-Keras

Jackhammer0102 · April 1, 2023, 5:08pm

Hi, ROOT experts,

I was trying to run the tutorial macro ClassificationKeras.py in tutorials/tmva/keras/, but I got errors. Can someone give me some ideas about how to make it work?

This part shows what happened before it failed:

2023-04-01 12:00:05.018837: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
DataSetInfo              : [dataset] : Added class "Signal"
                         : Add Tree TreeS of type Signal with 6000 events
DataSetInfo              : [dataset] : Added class "Background"
                         : Add Tree TreeB of type Background with 6000 events
                         : Dataset[dataset] : Class index : 0  name : Signal
                         : Dataset[dataset] : Class index : 1  name : Background
2023-04-01 12:00:10.683788: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Metal device set to: Apple M1

systemMemory: 8.00 GB
maxCacheSize: 2.67 GB

2023-04-01 12:00:10.685012: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-04-01 12:00:10.685312: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 64)                320       
                                                                 
 dense_1 (Dense)             (None, 2)                 130       
                                                                 
=================================================================
Total params: 450
Trainable params: 450
Non-trainable params: 0
_________________________________________________________________
Factory                  : Booking method: Fisher
                         : 
Fisher                   : [dataset] : Create Transformation "D" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
Fisher                   : [dataset] : Create Transformation "G" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
Factory                  : Booking method: PyKeras
                         : 
PyKeras                  : [dataset] : Create Transformation "D" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
PyKeras                  : [dataset] : Create Transformation "G" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
                         : Setting up tf.keras
                         : Using TensorFlow version 2
                         : Use Keras version from TensorFlow : tf.keras
2023-04-01 12:00:10.962627: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-04-01 12:00:10.962648: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
                         :  Loading Keras Model 
                         : Loaded model from file: model.h5
Factory                  : Train all methods
                         : Rebuilding Dataset dataset
                         : Building event vectors for type 2 Signal
                         : Dataset[dataset] :  create input formulas for tree TreeS
                         : Building event vectors for type 2 Background
                         : Dataset[dataset] :  create input formulas for tree TreeB
DataSetFactory           : [dataset] : Number of events in input trees
                         : 
                         : 
                         : Number of training and testing events
                         : ---------------------------------------------------------------------------
                         : Signal     -- training events            : 4000
                         : Signal     -- testing events             : 2000
                         : Signal     -- training and testing events: 6000
                         : Background -- training events            : 4000
                         : Background -- testing events             : 2000
                         : Background -- training and testing events: 6000
                         : 
DataSetInfo              : Correlation matrix (Signal):
                         : ----------------------------------------
                         :             var1    var2    var3    var4
                         :    var1:  +1.000  +0.391  +0.590  +0.813
                         :    var2:  +0.391  +1.000  +0.692  +0.734
                         :    var3:  +0.590  +0.692  +1.000  +0.851
                         :    var4:  +0.813  +0.734  +0.851  +1.000
                         : ----------------------------------------
DataSetInfo              : Correlation matrix (Background):
                         : ----------------------------------------
                         :             var1    var2    var3    var4
                         :    var1:  +1.000  +0.855  +0.914  +0.965
                         :    var2:  +0.855  +1.000  +0.927  +0.936
                         :    var3:  +0.914  +0.927  +1.000  +0.970
                         :    var4:  +0.965  +0.936  +0.970  +1.000
                         : ----------------------------------------
DataSetFactory           : [dataset] :  
                         : 
Factory                  : [dataset] : Create Transformation "D" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
Factory                  : [dataset] : Create Transformation "G" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
                         : Preparing the Decorrelation transformation...
                         : Preparing the Gaussian transformation...
TFHandler_Factory        : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:  0.0084120     1.0019   [    -3.1195     5.7307 ]
                         :     var2:  0.0078511    0.99981   [    -3.1195     5.7307 ]
                         :     var3:  0.0083128     1.0011   [    -3.1195     5.7307 ]
                         :     var4:  0.0076997    0.99886   [    -3.1195     5.7307 ]
                         : -----------------------------------------------------------
Factory                  : Train method: Fisher for Classification
                         : 
                         : Preparing the Decorrelation transformation...
                         : Preparing the Gaussian transformation...
TFHandler_Fisher         : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:  0.0084120     1.0019   [    -3.1195     5.7307 ]
                         :     var2:  0.0078511    0.99981   [    -3.1195     5.7307 ]
                         :     var3:  0.0083128     1.0011   [    -3.1195     5.7307 ]
                         :     var4:  0.0076997    0.99886   [    -3.1195     5.7307 ]
                         : -----------------------------------------------------------
Fisher                   : Results for Fisher coefficients:
                         : NOTE: The coefficients must be applied to TRANFORMED variables
                         :   List of the transformation: 
                         :   -- Deco
                         :   -- Gauss
                         : -----------------------
                         : Variable:  Coefficient:
                         : -----------------------
                         :     var1:       -0.221
                         :     var2:       -0.055
                         :     var3:       +0.032
                         :     var4:       +0.474
                         : (offset):       -0.002
                         : -----------------------
                         : Elapsed time for training with 8000 events: 0.0319 sec         
Fisher                   : [dataset] : Evaluation of Fisher on training sample (8000 events)
                         : Elapsed time for evaluation of 8000 events: 0.0153 sec       
                         : Creating xml weight file: dataset/weights/TMVAClassification_Fisher.weights.xml
                         : Creating standalone class: dataset/weights/TMVAClassification_Fisher.class.C
Factory                  : Training finished
                         : 
Factory                  : Train method: PyKeras for Classification
                         : 
                         : 
                         : ================================================================
                         : H e l p   f o r   M V A   m e t h o d   [ PyKeras ] :
                         : 
                         : Keras is a high-level API for the Theano and Tensorflow packages.
                         : This method wraps the training and predictions steps of the Keras
                         : Python package for TMVA, so that dataloading, preprocessing and
                         : evaluation can be done within the TMVA system. To use this Keras
                         : interface, you have to generate a model with Keras first. Then,
                         : this model can be loaded and trained in TMVA.
                         : 
                         : 
                         : <Suppress this message by specifying "!H" in the booking option>
                         : ================================================================
                         : 
                         : Preparing the Decorrelation transformation...
                         : Preparing the Gaussian transformation...
TFHandler_PyKeras        : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:  0.0084120     1.0019   [    -3.1195     5.7307 ]
                         :     var2:  0.0078511    0.99981   [    -3.1195     5.7307 ]
                         :     var3:  0.0083128     1.0011   [    -3.1195     5.7307 ]
                         :     var4:  0.0076997    0.99886   [    -3.1195     5.7307 ]
                         : -----------------------------------------------------------
TFHandler_PyKeras        : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:  0.0084120     1.0019   [    -3.1195     5.7307 ]
                         :     var2:  0.0078511    0.99981   [    -3.1195     5.7307 ]
                         :     var3:  0.0083128     1.0011   [    -3.1195     5.7307 ]
                         :     var4:  0.0076997    0.99886   [    -3.1195     5.7307 ]
                         : -----------------------------------------------------------
                         : Split TMVA training data in 6400 training events and 1600 validation events
                         : Training Model Summary
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 64)                320       
                                                                 
 dense_1 (Dense)             (None, 2)                 130       
                                                                 
=================================================================
Total params: 450
Trainable params: 450
Non-trainable params: 0
_________________________________________________________________
                         : Option SaveBestOnly: Only model weights with smallest validation loss will be stored

And these are the errors:

Epoch 1/20
2023-04-01 12:01:56.931210: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-04-01 12:01:57.254187: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fc138f94630
2023-04-01 12:01:57.254384: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fc138f94630
2023-04-01 12:01:57.270410: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fc138f94630
2023-04-01 12:01:57.270438: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fc138f94630
<WARNING>                : Failed to run python code: history = model.fit(trainX, trainY, sample_weight=trainWeights, batch_size=batchSize, epochs=numEpochs, verbose=verbose, validation_data=(valX, valY, valWeights), callbacks=callbacks)
<WARNING>                : Python error message:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:

Detected at node 'StatefulPartitionedCall_2' defined at (most recent call last):
    File "/Users/martin/Desktop/tmvaTutorials/keras/ClassificationKeras.py", line 74, in <module>
      factory.TrainAllMethods()
    File "<string>", line 1, in <module>
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_2'
could not find registered platform with id: 0x7fc138f94630
	 [[{{node StatefulPartitionedCall_2}}]] [Op:__inference_train_function_26145]
<FATAL>                         : Failed to train model
***> abort program execution
Traceback (most recent call last):
  File "/Users/martin/Desktop/tmvaTutorials/keras/ClassificationKeras.py", line 74, in <module>
    factory.TrainAllMethods()
cppyy.gbl.std.runtime_error: void TMVA::Factory::TrainAllMethods() =>
    runtime_error: FATAL error

P.S.
I’ve changed one of the lines from
model.compile(loss='categorical_crossentropy',optimizer=SGD(lr=0.01), metrics=['accuracy', ])
to
model.compile(tf.keras.optimizers.experimental.SGD(learning_rate=0.01),loss=tf.keras.losses.CategoricalCrossentropy(),metrics=['accuracy',])
in order to make my TensorFlow-Keras work.

And here is my slightly adjusted macro:
ClassificationKeras.py (2.3 KB)

Thank you!

moneta · April 2, 2023, 3:22pm

Hi,
Which ROOT and keras/tensorflow version are you using ? It works for me in the master and I see it works also on 6.28, looking at the nightly build result. It its possible that you are using an older version or maybe there are some issues with some new tensorflow versions.

Lorenzo

Jackhammer0102 · April 2, 2023, 4:35pm

Hi, Lorenzo,

Here are some information about my system:

ROOT version: 6.28/00

Python Platform: macOS-12.4-x86_64-i386-64bit

Tensor Flow Version: 2.11.0

Keras Version: 2.11.0

Python 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:24:27) [Clang 14.0.6 ]

Pandas 1.5.3

Scikit-Learn 1.2.2

GPU is available

Is there a good way to fix this issue?

Thank you!

moneta · April 3, 2023, 8:49am

Hi,

It is strange it does not work for you, I have the same tensorflow/keras version.
Also, your changes in the model.compile method should not be relevant. Both the string and the instance API are allowed in Keras, see Model training APIs

Best regards,

Lorenzo

Jackhammer0102 · April 3, 2023, 3:27pm

Thank you, Lorenzo! So it turns out that the problem was from ROOT, probably the previous installation wasn’t complete, then I deleted the whole Conda environment and then recreated a new environment and reinstalled ROOT, now I can run the macro.

Jackhammer0102 · April 4, 2023, 2:12am

I can now run all the tutorials in the keras directory (tutorials/tmva/keras/), thanks again! But I am running into another issue, I got these errors when I run the TMVA tutorial tmva101_Training.py:

/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/xgboost/core.py:617: FutureWarning: Pass `sample_weight` as keyword args.
  warnings.warn(msg, FutureWarning)
Traceback (most recent call last):
  File "/Users/martin/Desktop/tmvaTutorials/tmva101_Training.py", line 53, in <module>
    ROOT.TMVA.Experimental.SaveXGBoost(bdt, "myBDT", "tmva101.root")
  File "/Users/martin/opt/anaconda3/envs/tensorflow/lib/python3.10/site-packages/ROOT/_pythonization/_tmva/_tree_inference.py", line 105, in SaveXGBoost
    raise Exception(
Exception: Failed to get number of input variables from XGBoost model. Please provide the additional keyword argument 'num_inputs' to this function.

I don’t know why this macro doesn’t work for me either, my xgboost version is 1.7.5, could you please tell me what might cause the problem?

moneta · April 4, 2023, 7:34am

Hi,
It seems you are not using the latest tutorial version in 6.28. The tutorial was updated few month ago to fix this problem with latest xgboost versions:
See commit : [TMVA] Pythonizations for TMVA (#11069) · root-project/root@363373b · GitHub

The latest version works for me with 6.28 and latest xgboost (1.7.5)

Lorenzo

Jackhammer0102 · April 4, 2023, 8:50pm

It works now. Thanks a lot, Lorenzo!