Problems with JIT compiling a function

Attila_Krasznahorkay · June 22, 2018, 12:25pm

Hi,

I guess I’m mainly looking for the input of @Axel and/or @eguiraud on this one.

I’m trying to see if I could generalise an algorithm in our analysis software a bit. Instead of that piece of the code relying on the concrete type definitions of most of our EDM, I’d like to call on ROOT to do some operations for us.

First of all, I’d need ROOT to create (for instance) an object of type xAOD::ElectronContainer. But not with its default constructor. (Then I could just use TClass::New…) So I was trying to make the following piece of code work:

// ROOT include(s):
#include <TInterpreter.h>
#include <TClass.h>
#include <TString.h>
#include <TError.h>

int main() {

   // The name of this test:
   static const char* APP_NAME = "jitTest";

   // The type of the function we're after:
   typedef void* ( *function_t )( void );

   // Access the type we want to create, just to make sure that it is loaded:
   TClass* cl = TClass::GetClass( "xAOD::ElectronContainer" );
   if( ! cl ) {
      Error( APP_NAME, "Couldn't load class" );
      return 1;
   }
   Info( APP_NAME, "Loaded dictionary for class: %s", cl->GetName() );

   // Declare a new function to the interpreter:
   TInterpreter::EErrorCode ecode;
   gInterpreter->Declare( Form( "void* createElectrons() { return new %s( SG::VIEW_ELEMENTS ); }",
                                cl->GetName() ) );

   // Get a pointer to that function:
   function_t func = ( function_t ) gInterpreter->Calc( "&createElectrons", &ecode );
   if( ecode != TInterpreter::EErrorCode::kNoError ) {
      Error( APP_NAME, "Failed to jit the function" );
      return 1;
   }

   // Use the function:
   void* electronsRaw = (*func)();

   // Return gracefully:
   return 0;
}

But it doesn’t. I get the following from it at runtime:

[bash][tauriel]:build > ./jitTest 
Info in <jitTest>: Loaded dictionary for class: DataVector<xAOD::Electron_v1>
input_line_26:1:38: error: too few template arguments for class template 'DataVector'
void* createElectrons() { return new DataVector<xAOD::Electron_v1>( SG::VIEW_ELEMENTS ); }
                                     ^
input_line_24:1:44: note: template is declared here
template <typename T, typename BASE> class DataVector;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~       ^
input_line_28:2:5: error: use of undeclared identifier 'createElectrons'
 (& createElectrons)
    ^
libc++abi.dylib: terminating with uncaught exception of type cling::CompilationException: Error evaluating expression (& createElectrons)
Abort trap: 6
[bash][tauriel]:build >

Note that if I don’t use the xAOD::ElectronContainer type in my JIT compiled function, then this setup does work. (I’m able to get a pointer to the compiled function, and use it.)

At this point I’m leaning towards blaming ROOT… Since you see, the xAOD::ElectronContainer type is absolutely known to ROOT. If I open an interactive session from the same environment where I’m trying to run this test executable, I can absolutely do this:

[bash][tauriel]:build > root -b
   ------------------------------------------------------------
  | Welcome to ROOT 6.12/06                http://root.cern.ch |
  |                               (c) 1995-2017, The ROOT Team |
  | Built for macosx64                                         |
  | From tag v6-12-06, 9 February 2018                         |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------

root [0] xAOD::ElectronContainer c( SG::VIEW_ELEMENTS );
root [1] .q
[bash][tauriel]:build >

So… What am I missing? Why doesn’t TInterpreter want to understand my code correctly?

Note that I can force TInterpreter into working, if I jump through some hoops. Like:

gInterpreter->Declare( Form( "#define XAOD_STANDALONE\n#define ROOTCORE\n#define XAOD_ANALYSIS\n#include <xAODEgamma/ElectronContainer.h>\nvoid* createElectrons() { return new %s( SG::VIEW_ELEMENTS ); }",
                                cl->GetName() ) );

But there are multiple problems here:

I don’t want to have to know in the code what header to include in order to be able to use some random type;
Using our headers is a bit tricky, as they rely on some definitions to work correctly. The dictionaries do know about these definitions, but if I try to include one of these headers like this, I have to set them all by hand…

So… Any suggestions?

Cheers,
Attila

ROOT Version: 6.12/06
Platform: x86_64-mac1013-clang92-opt
Compiler: Xcode/clang 9.1

Attila_Krasznahorkay · June 22, 2018, 12:27pm

And also (forgot to mention it in the original post), it’s really not okay that the TInterpreter::Calc function makes my application abort. I would’ve really hoped that in case of an error I would just get notified about the failure using my ecode variable…

Axel · June 22, 2018, 1:06pm

Hi Attila,

Declare() takes your code as is, no magic interpreter things (like automatic #includes). You can use ProcessLine() instead.

Ack for Calc() that shouldn’t abort(): https://sft.its.cern.ch/jira/browse/ROOT-9497

Cheers, Axel.

eguiraud · June 22, 2018, 1:22pm

Hi Attila,
coming from a different angle, naiively: from the error message it seems that TClass* cl = TClass::GetClass("xAOD::ElectronContainer"); cl->Getname(); returns "DataVector<xAOD::Electron_v1>" (which seems to not be valid C++ as DataVector requires two template parameters, not one).

Is this the case? What does cl->GetName() return? Is it what you expect?

My (random) guess: xAOD::ElectronContainer is, in fact, a DataVector<xAOD::Electron_v1>, and DataVector does take two template parameters but the second one is defaulted in one of your headers, but not in the headers that cling sees, and that’s one the issue disappears with the extra defines and includes, they make visible to the header that the second template parameter has a default value.
To verify this you could just move the default value for the second template parameter of DataVector to a new header that just forward-declares the class and that you manually include from cling and also include from whatever file is currently defining that default parameter.

Cheers,
Enrico

Attila_Krasznahorkay · June 22, 2018, 1:38pm

Hi Enrico,

You can find the DataVector<T> definition here:

http://acode-browser1.usatlas.bnl.gov/lxr/source/athena/Control/AthContainers/AthContainers/DataVector.h?v=21.2

It’s one of the more complicated pieces of our EDM. (In fact, it’s probably the most complicated part of it.) Indeed, it has two template arguments. The second of which has a default value. Which in the end is very similar to how std::vector is implemented…

We “force” the ROOT dictionary to see xAOD::ElectronContainer as DataVector<xAOD::Electron_v1> using this trick:

http://acode-browser1.usatlas.bnl.gov/lxr/source/athena/Control/AthContainers/AthContainers/DataVector.h?v=21.2#3276

Because we don’t want the dictionary to know about this implementation detail. (To make our lives easier with schema evolution…)

But as I wrote, the interpreter used by interactive ROOT can deal with our EDM. We/I made sure of that. Now I wonder how gInterpreter is different in a simple application than what ROOT does in interactive mode…

Attila

eguiraud · June 22, 2018, 2:49pm

Hi Attila,
see @Axel’s reply: the interpreter uses ProcessLine or something equivalent to it, not Calc.
What happens if you switch Calc with ProcessLine in your snippet above?

Attila_Krasznahorkay · June 22, 2018, 3:03pm

Hi Axel,

Unfortunately TInterpreter::ProcessLine(...) doesn’t seem to be much better. I now used the following:

// ROOT include(s):
#include <TInterpreter.h>
#include <TClass.h>
#include <TString.h>
#include <TError.h>

/// Helper macro
#define CHECK_ERROR( code )                                 \
   do {                                                     \
      if( code != TInterpreter::EErrorCode::kNoError ) {    \
         Error( APP_NAME, "Failed on line %i", __LINE__ );  \
         return 1;                                          \
      }                                                     \
   } while( false )

int main() {

   // The name of this test:
   static const char* APP_NAME = "jitTest";

   // The type of the function we're after:
   typedef void* ( *function_t )( void );

   // Access the type we want to create, just to make sure that it is loaded:
   TClass* cl = TClass::GetClass( "xAOD::ElectronContainer" );
   if( ! cl ) {
      Error( APP_NAME, "Couldn't load class" );
      return 1;
   }
   Info( APP_NAME, "Loaded dictionary for class: %s", cl->GetName() );

   // Declare a new function to the interpreter:
   TInterpreter::EErrorCode ecode;
   gInterpreter->ProcessLine( Form( "void* createElectrons() { return new %s( SG::VIEW_ELEMENTS ); }",
                                    cl->GetName() ), &ecode );
   CHECK_ERROR( ecode );

   // Get a pointer to that function:
   function_t func = ( function_t ) gInterpreter->Calc( "&createElectrons", &ecode );
   CHECK_ERROR( ecode );

   // Use the function:
   void* electronsRaw = (*func)();

   // Return gracefully:
   return 0;
}

And I still get:

[bash][tauriel]:build > ./jitTest 
Info in <jitTest>: Loaded dictionary for class: DataVector<xAOD::Electron_v1>
input_line_26:1:38: error: too few template arguments for class template 'DataVector'
void* createElectrons() { return new DataVector<xAOD::Electron_v1>( SG::VIEW_ELEMENTS ); }
                                     ^
input_line_24:1:44: note: template is declared here
template <typename T, typename BASE> class DataVector;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~       ^
Error in <jitTest>: Failed on line 37
[bash][tauriel]:build >

But I believe I’m getting closer to the problem now…

[bash][tauriel]:build > root -b
   ------------------------------------------------------------
  | Welcome to ROOT 6.12/06                http://root.cern.ch |
  |                               (c) 1995-2017, The ROOT Team |
  | Built for macosx64                                         |
  | From tag v6-12-06, 9 February 2018                         |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------

root [0] new DataVector<xAOD::Electron_v1>( SG::VIEW_ELEMENTS );
ROOT_prompt_0:1:5: error: too few template arguments for class template 'DataVector'
new DataVector<xAOD::Electron_v1>( SG::VIEW_ELEMENTS );
    ^
Forward declarations from /Users/krasznaa/ATLAS/Software/AnalysisBase/21.2.34/InstallArea/x86_64-mac1013-clang91-opt/lib/AnalysisBase.rootmap:1:379: note: template is declared here
root [1]

That line (379) of AnalysisBase.rootmap is “around” here:

[ libxAODAssociationsDict.so ]
# List of selected classes
class DataLink<DataVector<xAOD::TrackParticleClusterAssociation_v1,DataModel_detail::NoBase> >
class DataLink<DataVector<xAOD::TrackParticleClusterAssociation_v1> >
class DataLink<xAOD::TrackParticleClusterAssociationContainer_v1>
class DataVector<xAOD::TrackParticleClusterAssociation_v1,DataModel_detail::NoBase>
class DataVector<xAOD::TrackParticleClusterAssociation_v1>
class ElementLink<DataVector<xAOD::TrackParticleClusterAssociation_v1,DataModel_detail::NoBase> >
class ElementLink<DataVector<xAOD::TrackParticleClusterAssociation_v1> >
class ElementLink<xAOD::TrackParticleClusterAssociationContainer_v1>
class vector<DataLink<DataVector<xAOD::TrackParticleClusterAssociation_v1,DataModel_detail::NoBase> > >
class vector<DataLink<DataVector<xAOD::TrackParticleClusterAssociation_v1> > >
class vector<DataLink<xAOD::TrackParticleClusterAssociationContainer_v1> >
class vector<ElementLink<DataVector<xAOD::TrackParticleClusterAssociation_v1,DataModel_detail::NoBase> > >
class vector<ElementLink<DataVector<xAOD::TrackParticleClusterAssociation_v1> > >
class vector<ElementLink<xAOD::TrackParticleClusterAssociationContainer_v1> >
class vector<std::vector<ElementLink<xAOD::TrackParticleClusterAssociationContainer_v1> > >
class vector<vector<ElementLink<DataVector<xAOD::TrackParticleClusterAssociation_v1,DataModel_detail::NoBase> > > >
class vector<vector<ElementLink<DataVector<xAOD::TrackParticleClusterAssociation_v1> > > >
class xAOD::TrackParticleClusterAssociationAuxContainer_v1
class xAOD::TrackParticleClusterAssociationContainer_v1
class xAOD::TrackParticleClusterAssociation_v1

So… ROOT is made aware of DataVector<T> being callable both with two template parameters, and also just with one as it seems. And then the interpreter gets confused about what it should be doing…

This sort of duality is present in the rootmap file for all of our container types. Is there something that we should do to make ROOT forget about the 2 argument version of the type?

Cheers,
Attila

Attila_Krasznahorkay · June 22, 2018, 3:16pm

And just to point out, if I use the following code, that also works:

// ROOT include(s):
#include <TInterpreter.h>
#include <TClass.h>
#include <TString.h>
#include <TError.h>

/// Helper macro
#define CHECK_ERROR( code )                                 \
   do {                                                     \
      if( code != TInterpreter::EErrorCode::kNoError ) {    \
         Error( APP_NAME, "Failed on line %i", __LINE__ );  \
         return 1;                                          \
      }                                                     \
   } while( false )

int main() {

   // The name of this test:
   static const char* APP_NAME = "jitTest";

   // The type of the function we're after:
   typedef void* ( *function_t )( void );

   // Declare a new function to the interpreter:
   TInterpreter::EErrorCode ecode;
   gInterpreter->ProcessLine( "void* createElectrons() { return new xAOD::ElectronContainer( SG::VIEW_ELEMENTS ); }",
                              &ecode );
   CHECK_ERROR( ecode );

   // Get a pointer to that function:
   function_t func = ( function_t ) gInterpreter->Calc( "&createElectrons", &ecode );
   CHECK_ERROR( ecode );

   // Use the function:
   void* electronsRaw = (*func)();

   // Return gracefully:
   return 0;
}

Still, since my “real” use case will be getting a type name from typeid(...), the templated name with a single argument should be made to work as well… (Our code is capable of “sanitising” typeid names to a reasonable level, but it will not know about typedefs like that.)

Cheers,
Attila

eguiraud · June 23, 2018, 10:24am

Hi Attila,
ok so the real question is either

given DataVector<T1, T2> with T2 being defaulted somewhere, how can ProcessLine/Calc be made aware that the default value for T2 exists without #include-ing the relevant header, or
given an alias(?) Electrons which is really a DataVector<Electron, DataVectorBase<Electron>::Base>, how can one retrieve the full typename from e.g. TClass, and not just DataVector<Electron>

I can’t answer either unfortunately (@Axel might), but you do have workarounds, i.e. make a declaration of DataVector with the default parameter visible to the interpreter or adding the second template parameter by string manipulation.

Cheers,
Enrico

Attila_Krasznahorkay · June 23, 2018, 2:23pm

Hi Enrico,

Wait, wait, wait…

I absolutely admit that this class is a very complicated one. But you can’t say that ROOT is doing everything in this setup correctly (or that we declared everything correctly to ROOT…) if TClass sees the xAOD::ElectronContainer typedef as DataVector<xAOD::Electron_v1> (this is what TClass::GetName() returns for the dictionary), but then TInterpreter can’t figure out what I mean by DataVector<xAOD::Electron_v1>. Something definitely needs to be fixed here, either on the ATLAS or the ROOT side, as currently the setup is just not good enough.

For now I’ll experiment further, by requiring the “algorithm” in our code (that I’m trying to simplify/streamline) to receive the typedef’d name (the one that TInterpreter knows what to do with) as a configuration property. Even though I was hoping that the code could figure out a “good enough” type name at runtime as well…

Cheers,
Attila

eguiraud · June 23, 2018, 3:47pm

Yes, I agree there is an issue either on ATLAS’ or on ROOT’s side, and yours is a good summary!
What I listed above are just two things that would make your original setup work if figured out/fixed.

Given the complexity of your setup, though, not only I can’t offer a proper fix but I can’t even tell whether ROOT is doing something wrong or is working as intended (and you are hitting a limitation of the system), so I’ll let the actual experts comment rather than keep adding noise

Attila_Krasznahorkay · June 27, 2018, 9:46am

Dear All,

I thought it may help if I attached a simple, standalone demonstration of this issue.

After building the libEDM.so library, you can try executing the included test.C macro. Which for me results in:

[bash][eduroam-1133]:templateHandling > root -l -b -q test.C 

Processing test.C...
/Users/krasznaa/ATLAS/sw/projects/volatile/anaNtuple/templateHandling/./test.C:12:4: error: 'DataVector' does not name a template but is followed by template arguments
   DataVector<xAOD::Electron> c2;
   ^         ~~~~~~~~~~~~~~~~
/Users/krasznaa/ATLAS/sw/projects/volatile/anaNtuple/templateHandling/./test.C:12:4: note: non-template declaration found by name lookup
[bash][eduroam-1133]:templateHandling >

But if you uncomment the line instantiating xAOD::ElectronContainer, then it works…

[bash][eduroam-1133]:templateHandling > root -l -b -q test.C 

Processing test.C...
Class name: DataVector<xAOD::Electron>
[bash][eduroam-1133]:templateHandling >

I’m very open to suggestions…

Cheers,
Attila

Attila_Krasznahorkay · June 27, 2018, 10:10am

templateHandling.tar.bz2 (1.7 KB)

And now I even included the code…

sbinet · June 27, 2018, 12:04pm

while weeding out bugs in various parts of our HEP s/w stack is always a commendable endeavour, wouldn’t it be less arcane to just have a little registry of xAOD::foo c-tor functions?

a template<T> T* Create() would fit the bill, with all the specializations being in your xaod_registry.cxx file, with all the #include<T.h>.

for any given C++ library out there that is being released and tested, the C++ compiler is more used and more tested.

Attila_Krasznahorkay · June 27, 2018, 1:54pm

Hi Sebastien,

Sure I could write my own factory if I wanted to. But this being the ROOT Forum, I guess it’s understandable why we’re discussing how ROOT could provide this factory functionality instead.

And by now it has anyway turned into a much more general question of how well our EDM is supported by the dictionaries that we make for it. That’s really not specific to this issue.

Best,
Attila

sbinet · June 27, 2018, 2:23pm

of course, sure.
my main point (and I’ll shut up) is that marrying one complicate thing (ATLAS EDM) with one sophisticated thing (JIT+CLing) doesn’t sound like a robust engineering option, with many possible, interesting interaction modes between the 2.

that’s why I’d always advocate for the dumb & clear solution.
(well… if “clear” can ever be completely true with templates and explicit instantiation .)

pcanal · June 27, 2018, 6:55pm

Hi Attila,

‘just’ commenting out an instantiation should not improve things :). This sounds like a deficiency that we need to resolved. If you have not done so already, can you open a JIRA ticket about this?

Thanks,
Philippe.

Attila_Krasznahorkay · June 28, 2018, 8:23am

Hi Philippe,

Did it now.

https://sft.its.cern.ch/jira/browse/ROOT-9507

Cheers,
Attila

system · July 12, 2018, 8:23am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.