Is there non blocking equivalent to TMethodCall::Execute?

goetzgaycken · January 23, 2026, 9:34am

Hello,
In an old piece of code, we use in Atlas, there is an expression parser which uses MethodCall::Execute (e.g. “double ret; void *obj=…;m_method→Execute(obj,ret);")
to invoke methods of an object e.g. Muon::pt() const. Now this code was ported to a multi threaded environment and it performs horribly. Likely because MethodCall::Execute is protected internally by a TVirtualRWMutex::Lock. At least that is what shows up in profiles. There must be a better way, since RDataFrame does not suffer from this problem.

Do You have a recommendation for an improvement ?

Thank You Very much !
Götz

Danilo · January 25, 2026, 7:18am

Hi Götz,

Indeed, in presence of critical sections, MT code does not really scale.
And you are right, in presence of good designs, all the power of the hardware can be squeezed out!
Maybe to solve this issue well, maybe more context is needed.
Could you perhaps expose more details about your setup and what you want to achieve? There could be several options, depending on the situation.

Cheers,
Danilo

goetzgaycken · January 26, 2026, 9:35am

Hi Danilo,
for filtering we have a tool which filters events or objects based on expressions like muon.pt>20 && muon.eta>-2.4 && muon.eta<2.4. muon is a container name, pt a method of a container object, for which there is a TClass which knows about the methods e.g. pt and eta. The expression is parsed during initialization, and executed during runtime, the data is provided by the event store we use in atlas through handles. In principle TClass knows that the method exists, that the method has the correct signature i.e. double ()(const T *obj);. TClass does not know whether it would be safe to call the method in a MT context, I suppose, it only knows that the method is declared const. Anyway Execute is mutex protected. I did not profile this myself. So, I do not know exactly what was done. It is also possible that the VTune instrumentation makes these calls appear more expensive than they are without instrumentation. Anyway, the mutex protection seems unnecessary in this case, and since RDataFrame has the possibility to define “arbitrary” methods, which can be executed without such a penalty, there must be a better way.

Thanks !
Götz

P.S.”
To call e.g. pt we use once, I think, something like this
TMethodCall::InitWithPrototype (“….::Muon”, "pt", ““,false, kConversionMatch)
(btw. the false seems wrong … ). And then TMethodCall::Execute(obj, ret_dbl), for e.g. every event.

pcanal · January 26, 2026, 5:49pm

When calling the interpreter via gROOT->ProcessLine, we release the ROOT lock during during the execution of the user code and thus other work can proceed in parallel. From your description it looks like TMethodCall::Execute is not properly (i.e as it should) releasing the lock upon calling the user code.

Danilo · January 27, 2026, 7:43am

Hi Götz,

Thanks.
Do you have evidence, e.g. a scaling issue, that these calls are hindering parallelism? We have seen in the past a few instances of VTune not being fully reliable when used in combination with ROOT’s global mutex.

Best,
D

goetzgaycken · January 27, 2026, 8:59am

I did not do the profiling myself, I can do some further investigations. Without any particular profiling it was observed that the code scales very poorly with the number of threads i.e. already with 2 threads the cpu efficiency drops. So, there is a problem somewhere. A VTune profile with (ollect threading) shows significant lock contention In ROOT::TVirtualRWMutex::Lock called by TClingCallFunc::exec which is called by TMethodCall::Execute. My personal experience is that lock contention often appears amplified, because, I think, to profile the contention an additional instrumentation is needed which may add a significant overhead if the lock protected scope is very small but visited very often, In the latter case the actual real problem (i.e. a less often visited lock protected scope, which is called significant less often but takes longer) may be overshadowed. In this case the TMethodCall is called per event and object. Typically objects in filter expressions are jets, muons, electrons, taus, so they are not super frequent. Though, there might be tracks. Then the frequency could be high. And the call count is large. In that sense I would be a bit careful, in saying that this is the problem. But this is currently what the profile points at.

Thanks !

StephanH · February 5, 2026, 11:13am

Hello Götz,

I had a look in the code, and the scaling issue is not connected to the method call itself, as @pcanal was mentioning above. The lock is indeed released just before the call.
The problem, however, is that TClingCallFunc::exec indeed does a lot of setup before the call, and it has to do it over and over again for every invocation.

So we clearly need a way to memorise the information retrieved from the interpreter, so the call can proceed without taking the lock again. RDataFrame does this by JITting functions and saving their pointers.
I will later look at InitWithPrototype, because this sounds like it could go in the right direction.

StephanH · February 5, 2026, 5:25pm

@pcanal found a much faster way that needs the interpreter only for initialisation:
If you can guarantee that the types of arguments don’t change (or do the conversion yourself), one can get directly the function pointer of the method. The way to do it is shown here:

github.com/root-project/root

core/base/src/TDirectory.cxx

9cd496f4a


      
          static TBuffer* R__CreateBuffer()
          {
             typedef void (*tcling_callfunc_Wrapper_t)(void*, int, void**, void*);
             static tcling_callfunc_Wrapper_t creator = nullptr;
             if (!creator) {
                R__LOCKGUARD(gROOTMutex);
                TClass *c = TClass::GetClass("TBufferFile");
                TMethod *m = c->GetMethodWithPrototype("TBufferFile","TBuffer::EMode,Int_t",kFALSE,ROOT::kExactMatch);
                creator = (tcling_callfunc_Wrapper_t)( m->InterfaceMethod() );
             }
             TBuffer::EMode mode = TBuffer::kWrite;
             Int_t size = 10000;
             void *args[] = { &mode, &size };
             TBuffer *result;
             creator(nullptr,2,args,&result);
             return result;
          }

Once you have the function pointer (creator in this case), it can be called as

creator(object, <number of args>, <array of ptr to args>, <address of the result>);

The typedef you see at the very top is also publicly available if you include TClingCallFunc.h.

So the minimum example would look something like:

#include <TClass.h>
#include <TClingCallFunc.h>

tcling_callfunc_Wrapper_t getTrampolineFunction([...]) {
      R__LOCKGUARD(gROOTMutex);
      auto interfaceMethod = TClass::GetClass("TH1D")->GetMethod("MethodName", "Args")->InterfaceMethod();
      return (tcling_callfunc_Wrapper_t)( interfaceMethod );
}

void call([...]) {
   void *args[] = { &mode, &size };
   ResultType *result;
   trampolineFunction(object, 2, args, &result); // <-- (thisPtr, number of args, ptr to array of ptrs to args, ptr to result)
   return *result;
}

goetzgaycken · February 10, 2026, 10:12am

That works very well. Thanks You very much! I have still have a problem, with the argument list. I would like to create the prototype from types e.g. template<typename…T> std::string makeProto() but for this I need something to convert T into a normalized type name, but I only found ROOT::Internal::GetDemangledTypeName which does not sound like a robust solution. Is there a more official solution ?

Thanks !

StephanH · February 16, 2026, 10:47am

If it’s only about using templates instead of strings for the type name, is this helpful?

root [1] TClass::GetClass<TH1D>()
(TClass *) 0x7fbfd9996250

From here, you can continue with the method name as above.

StephanH · February 16, 2026, 12:33pm

For arguments, the TClassEdit namespace should help. It goes back to the same functionality as you mentioned, but it’s not in ROOT::Internal.

#include <TClassEdit.h>                                                                                                    
#include <iostream>                                                                                                        
                                                                                                                           
template<typename T>                                                                                                       
void myFunc(){                                                                                                                                                                                                                                                                                                                          
        int errorCode;                                                                                                     
        std::cout << TClassEdit::DemangleTypeIdName(typeid(T), errorCode) << "\n";                                         
}

(Note the statement in the docs about the caller having to free the memory.)

Usage demo:

root [0] .L /tmp/typeGymnastics.cxx 
root [1] myFunc<int>();
int
root [2] using aaa = int;
root [3] myFunc<aaa>();
int

In the same namespace, there’s also GetNormalizedName if it’s strings instead of typeid.

system · March 2, 2026, 12:34pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.