SOFIE and RDataFrame

Hello,
I am learning how to use SOFIE to create RDataFrames with a trained neural network that I have saved in an ONNX file. I have successfully generated a root file and .hxx file using SOFIE from this ONNX file but now I am running into issues.

Some information: I am using ROOT version 6.32.02. The neural network has been trained to accept two floats as input and predict the sum of the squares of those inputs. I have been using the TMVA_SOFIE_RDataFrame.C file as a template for creating an RDataFrame. I have created the files: trained_model.dat, trained_model.hxx, and trained_model.root from a file called trained_model.onnx using the SOFIE ONNX parser. My program looks like this:

using namespace TMVA::Experimental;

//ROOTSYS = /afs/cern.ch/user/v/vstaryns/eos/root_test/root_install
R__ADD_INCLUDE_PATH($ROOTSYS/include)
R__ADD_INCLUDE_PATH($ROOTSYS/runtutorials)

#include "trained_model.hxx"
//full path for below = $ROOTSYS/include/TMVA/SOFIEHelpers.hxx
#include "TMVA/SOFIEHelpers.hxx"

using namespace TMVA::Experimental;

void RDataFrame_From_ONNX(int nthreads = 2){

  std::string inputFile = "trained_model.root";
      ROOT::RDataFrame df1("ntuple", inputFile);
      int nslots = df1.GetNSlots();
      std::cout << "Running using " << nslots << " threads" << std::endl;
  
      auto h1 = df1.DefineSlot("DNN_Value", SofieFunctor<2, TMVA_SOFIE_trained_model::Session>(nslots, inputFile),
                              {"x", "y"})
                  .Histo1D({"sumSquared", "squareOfSums", 100, 0, 1}, "DNN_Value");
      
      //histogram color and set up the canvas
      h1->SetLineColor(kRed);
  
      auto c1 = new TCanvas();
      //gStyle->SetOptStat(0);
  
      //having trouble finding documentation on the DrawClone() function
      h1->Draw("SAME");
      c1->BuildLegend();
}

And when I run this program I recieve the following output.

Processing RDataFrame_From_ONNX.cpp...
In module 'ROOTTMVASofie':
/eos/home-v/vstaryns/root_test/root_install/include/TMVA/SOFIEHelpers.hxx:48:58: error: too few arguments to function call, expected 2, have 1
      auto y =  fSessions[slot].infer(fInput[slot].data());
                ~~~~~~~~~~~~~~~~~~~~~                    ^
/eos/home-v/vstaryns/root_test/root_install/include/ROOT/RDF/RDefine.hxx:85:10: note: in instantiation of member function 'TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>::operator()' requested here
         fExpression(slot, fValues[slot][S]->template Get<ColTypes>(entry)...);
         ^
/eos/home-v/vstaryns/root_test/root_install/include/ROOT/RDF/RDefine.hxx:129:10: note: in instantiation of function template specialization 'ROOT::Detail::RDF::RDefine<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot>::UpdateHelper<float, float, 0UL, 1UL>' requested here
         UpdateHelper(slot, entry, ColumnTypes_t{}, TypeInd_t{}, ExtraArgsTag{});
         ^
/eos/home-v/vstaryns/root_test/root_install/include/ROOT/RDF/RDefine.hxx:98:4: note: in instantiation of member function 'ROOT::Detail::RDF::RDefine<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot>::Update' requested here
   RDefine(std::string_view name, std::string_view type, F expression, const ROOT::RDF::ColumnNames_t &columns,
   ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/ext/new_allocator.h:162:23: note: in instantiation of member function 'ROOT::Detail::RDF::RDefine<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot>::RDefine' requested here
        { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
                             ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/alloc_traits.h:516:8: note: in instantiation of function template specialization '__gnu_cxx::new_allocator<ROOT::Detail::RDF::RDefine<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot> >::construct<ROOT::Detail::RDF::RDefine<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot>, std::basic_string_view<char, std::char_traits<char> > &, std::basic_string<char> &, TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, const std::vector<std::basic_string<char>, std::allocator<std::basic_string<char> > > &, ROOT::Internal::RDF::RColumnRegister &, ROOT::Detail::RDF::RLoopManager &>' requested here
          __a.construct(__p, std::forward<_Args>(__args)...);
              ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/shared_ptr_base.h:519:30: note: (skipping 4 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
          allocator_traits<_Alloc>::construct(__a, _M_ptr(),
                                    ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/shared_ptr.h:862:14: note: in instantiation of function template specialization 'std::shared_ptr<ROOT::Detail::RDF::RDefine<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot> >::shared_ptr<std::allocator<ROOT::Detail::RDF::RDefine<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot> >, std::basic_string_view<char, std::char_traits<char> > &, std::basic_string<char> &, TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, const std::vector<std::basic_string<char>, std::allocator<std::basic_string<char> > > &, ROOT::Internal::RDF::RColumnRegister &, ROOT::Detail::RDF::RLoopManager &>' requested here
      return shared_ptr<_Tp>(_Sp_alloc_shared_tag<_Alloc>{__a},
             ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/shared_ptr.h:878:19: note: in instantiation of function template specialization 'std::allocate_shared<ROOT::Detail::RDF::RDefine<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot>, std::allocator<ROOT::Detail::RDF::RDefine<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot> >, std::basic_string_view<char, std::char_traits<char> > &, std::basic_string<char> &, TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, const std::vector<std::basic_string<char>, std::allocator<std::basic_string<char> > > &, ROOT::Internal::RDF::RColumnRegister &, ROOT::Detail::RDF::RLoopManager &>' requested here
      return std::allocate_shared<_Tp>(std::allocator<_Tp_nc>(),
                  ^
/eos/home-v/vstaryns/root_test/root_install/include/ROOT/RDF/RInterface.hxx:2964:29: note: in instantiation of function template specialization 'std::make_shared<ROOT::Detail::RDF::RDefine<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot>, std::basic_string_view<char, std::char_traits<char> > &, std::basic_string<char> &, TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, const std::vector<std::basic_string<char>, std::allocator<std::basic_string<char> > > &, ROOT::Internal::RDF::RColumnRegister &, ROOT::Detail::RDF::RLoopManager &>' requested here
      auto newColumn = std::make_shared<NewCol_t>(name, retTypeName, std::forward<F>(expression), validColumnNames,
                            ^
/eos/home-v/vstaryns/root_test/root_install/include/ROOT/RDF/RInterface.hxx:369:14: note: in instantiation of function template specialization 'ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::DefineImpl<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float>, ROOT::Detail::RDF::ExtraArgsForDefine::Slot, double>' requested here
      return DefineImpl<F, RDFDetail::ExtraArgsForDefine::Slot>(name, std::move(expression), columns, "DefineSlot");
             ^
/eos/home-v/vstaryns/root_test/RDataFrame_From_ONNX.cpp:55:19: note: in instantiation of function template specialization 'ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::DefineSlot<TMVA::Experimental::SofieFunctorHelper<std::integer_sequence<unsigned long, 0, 1>, TMVA_SOFIE_trained_model::Session, float> >' requested here
    auto h1 = df1.DefineSlot("DNN_Value", SofieFunctor<2, TMVA_SOFIE_trained_model::Session>(nslots, inputFile),
                  ^
/eos/home-v/vstaryns/root_test/trained_model.hxx:183:20: note: 'infer' declared here
std::vector<float> infer(size_t unk__6,float* tensor_sequential){

I noticed that inside trained_model.hxx, the infer() function was generated to take two parameters instead of just one. In the example files for this functionality, the Higgs_model.hxx file had an infer function with the signature: std::vector<float> infer(float* tensor_input) but my trained_model.hxx file has an infer function with the signature std::vector<float> infer(size_t unk__6,float* tensor_input)

Based on the error message it seems like this signature may be the cause of the problem, but this is just how the file was generated, and the unk_6 parameter is used many times within the infer function in trained_model.hxx

Any advice about why trained_model.hxx generated this way and on how I can generate this RDataFrame column using my model would be greatly appreciated

Thank you!
Teddy

As a follow up, here are the files mentioned in my post. The .zip file contains trained_model.dat and trained_model.onnx since neither of those are supported filetypes to upload here.
trained_model.zip (40.8 KB)

trained_model.root (7.9 KB)
trained_model.hxx (11.9 KB)

Hello Teddy,

Thank you for reporting this problem. The issue is caused, as you found, by the extra parameter. This represents teh batch size of the model. I will provide a fix in the SofieFunctor class used by RDF. For the time being as workaround, can you try to generate the model with a fixed batch size (e.g 1) ?
You can do this by calling , when generating the C++ code:

TMVA::Experimental::SOFIE::RModelParser_ONNX p;
auto m = p.Parse("trained_model.onnx");
m.Generate(TMVA::Experimental::SOFIE::Options::kDefault, 1);  // use here 1 for batch size
m.OutputGenerated();

Cheers

Lorenzo

This workaround worked! I was able to run my example without error after setting the constant batch size.

Thank you for your help!