Running fast-jet and fast-jet contrib in RDataFrame

Dear all,

does anybody have an example how to run the fastjet package

in RDataFrame ?

I would like to read the input from a tree, run a jet- or event shape algorithm and plot the result

Thank you for your help.

Regards,
Tancredi


Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided


Hello,
I figured it out just by loading the C++ fast-jet library and using a compiled C++ function returning the jet vector. If anybody knows a pure python solution I would still be interested.
Regards,
Tancredi

Hi Tancredi,
currently pure Python code cannot be injected in the RDF event loop. You can export the data you need out of RDF (after applying filters, selections or other transormations) with AsNumpy and then operate on numpy arrays, or indeed you can use C++ functions.

If fast-jet functions can be called inside of numba-compiled Python functions, then you might be able to use ROOT.Numba.Declare to generate a function that can be passed to RDF, see ROOT: tutorials/pyroot/pyroot004_NumbaDeclare.py File Reference (you can use ROOT.Numba.Declare with arrays too, RVecs are converted to numpy arrays).

Cheers,
Enrico

Hello,
I have a follow-up question. I managed to run fastjet loading the C++ shared libarary and I have now a vector ROOT::VecOps::RVec ROOT::Math::PtEtaPhiEVector.
I can give these jet 4-vectors simply to RDataFrame via
d=d.Define(‘jets’,GetJet(inputs))

This works great.

However, I do not understand how I can give more complex structures, e.g. I would like to calculate some jet properties, i.e. a vector of n-additional floats.
This could be most easily encoded in a std::map (“name”,value). How to give such a structure to RDataFrame ?

Can one define a C++ class or struct containing the jet-vector vectors and their properties and make them available to RDataFrame. How can this be done ?

A simple solution would be to call the GetJet function with some arguments, but this would require that I run the jet algorithm several times per event which is not efficient.

Thank you for the help.

    Tancredi

Hi Tancredi,
I am not sure I understand this latest question but I’ll mention a couple of things that might help.

  1. you can return any C++ type from a Define, so you can define your own struct or class that contains all the information you need and inject it in RDF using a Define or a DefinePerSample (currently only available in nightly builds, coming in the next release). So given that struct, you can write a function that takes zero or more input RDF columns and returns the class and inject it as df.Define("mystruct", funcThatReturnsMyStruct, {"x", "y", "z", ...})

  2. you can define a C++ functor class, i.e. a function that also stores some state, and pass it to Define. E.g.

struct MyFunctor {
  SomeData d;  

  MyFunctor(SomeData d) { ...; }
  // this is the callable operator, similar to Python's __call__
  float operator()(float x, float y, float z) {
    return SomeCalculation(d, x, y, z);
  }
};

MyFunctor myfunctor(somedata);
df.Define("newCol", myfunctor, {"x", "y", "z"});

My code above is C++ but you can mix it with Python as usual.

I hope this helps. If not, please provide a concrete example of the problems you mention above.
Enrico

Great ! Thank you. This helps a lot.
Tancredi

Hello Enrico,
I can not figure out how your example can be use in python

How you do it for the two lines:

MyFunctor myfunctor(somedata);
df.Define("newCol", myfunctor, {"x", "y", "z"});

I do not see how I can do the declaration part (first line)
This does not work:
d=d.Define(‘myfunctor’,‘MyFunctor myfunctor(1.)’)

If I do the declaration in C++ I do not see how to access the functor in python.

Thank you for your help.
Tancredi

Here’s an example:

import ROOT
ROOT.gInterpreter.Declare("""
struct MyFunctor {
    float operator()(float x) { return 21*x; }
};
""")

f = ROOT.MyFunctor()
m = ROOT.RDataFrame(1).Define("x", "2.f").Define("newcol", f, ["x"]).Max("newcol").GetValue()
print("the answer: ", m)

Cheers,
Enrico

Hello,
I still have a problem. I need a functor class that takes first some particles as inputs and then forms jets and return these (std::vector ). I was able to figure this out based on your example. However, I also would like to calculate some jet moments (array of floats) that I store in a map <string, float>. So I need a function like float GetMoment(name) to return these or two overloaded operators.

I extended your example code to make the problem clear:
“”
import ROOT
ROOT.gInterpreter.Declare("""
struct MyFunctor {
float operator()(float x) { return SomeOtherCalculation(x); }
float operator()(int x, int y) {return SomeCalculation(x, y); }
int SomeCalculation(int x, int y) {return x+y;}
int SomeOtherCalculation(float x) {return 21.*x;}
};
“”")

f = ROOT.MyFunctor()
m = ROOT.RDataFrame(1).Define(“x”, “1”).Define(“y”, “2”).Define(“newcol”, f, [“x”,“y”]).Max(“newcol”).GetValue()
m = ROOT.RDataFrame(1).Define(“x”, “1.f”).Define(“newcol”, f, [“x”]).Max(“newcol”).GetValue()
print("the answer: ", m)
“”
Here I define two operators with different input arguments. They both work individually but not together.
The message is:
“”
TypeError: Template method resolution failed:
ROOT::RDF::RInterfaceROOT::Detail::RDF::RLoopManager,void ROOT::RDF::RInterfaceROOT::Detail::RDF::RLoopManager,void::Define(experimental::basic_string_view<char,char_traits > name, experimental::basic_string_view<char,char_traits > expression) =>
TypeError: takes at most 2 arguments (3 given)
Failed to instantiate “Define(std::string,MyFunctor&,std::initializer_liststd::string)”
Failed to instantiate “Define(std::string,MyFunctor*,std::initializer_liststd::string)”
Failed to instantiate “Define(std::string,MyFunctor,std::initializer_liststd::string)”

“”
May be you see a better solution ?

Thank you for your help.
Regards,
Tancredi

Hi,
overloaded operators are not supported at the moment (we plan to lift this limitation soon: 115th ROOT Parallelism, Performance and Programming Model Meeting (28 October 2021) · Indico ).

In the meanwhile you can solve this, as always, with an extra layer of indirection: put all the data that you can pre-calculate and that stays constant between entries in a dedicated struct, and have two functors that store a pointer or reference to that struct, each with a single operator():

data = ROOT.PrecalculateStuff(...)
functor1 = ROOT.Functor1(data)
functor2 = ROOT.Functor2(data)

df.Define(“newcol”, functor1, ["x", "y"])\
   .Define("otherwcol", functor2, ["x", "y"])

Cheers,
Enrico

However, I do not understand how I can give more complex structures, e.g. I would like to calculate some jet properties, i.e. a vector of n-additional floats. This could be most easily encoded in a std::map name,value. How to give such a structure to RDataFrame.

Hi,
it’s enough to pass a function or functor to Define that returns that structure:

df.Define("mapOfJetProperties", makeMapOfJetProperties, ["col1", "col2", "col3"])

Cheers,
Enrico