Cannot use custom class with RDataFrame Fill method

Dear all,

I think I am experiencing two issues at same time but I cannot factorize them for sure.
I am trying to create a custom class to be used with Fill method of RDataFrame.

I started with this very simple class (actually it just calls a TH1F)
Boo.cpp (322 Bytes) Boo.h (262 Bytes)

I am setting up root 6.14.04 with gcc62 on a SLC6 machine (from /cvmfs/sft.cern.ch/lcg/releases/LCG_94/ROOT/6.14.04/x86_64-slc6-gcc62-opt/bin/root) and creating the shared library with

g++ -c -fpic Boo.cpp `root-config --cflags --libs`
g++ -shared -o libBoo.so Boo.o `root-config --cflags --libs`

If now in python (2) I do something like

import ROOT
ROOT.gInterpreter.Declare('#include "Boo.h"')
ROOT.gSystem.Load("libBoo.so")
boo=ROOT.Boo()
boo.Fill(0,15)
myh=boo.GetHisto()

everything works fine. Even in an interactive root session apparently everything works

$ root -l
root [0] gInterpreter->Declare("#include \"Boo.h\"")
(bool) true
root [1] gSystem->Load("libBoo.so")
(int) 0
root [2] auto b=new Boo()
(Boo *) @0x7ffdb1bd2618
root [3] b->Fill(0,15)
root [4] b->GetHisto()->GetEntries()
(double) 1.0000000

But now if I try something like

#create RDataFrame df here
b=df.Filter("mjj>1500").Fill(ROOT.Boo(),("mjj”,”weight”))

I got AttributeError: 'RDataFrame' object has no attribute 'Fill'
and even worst in a root interactive session

In file included from /mnt/build/jenkins/workspace/lcg_release_tar/BUILDTYPE/Release/COMPILER/gcc62binutils/LABEL/slc6/build/projects/ROOT-6.14.04/src/ROOT-6.14.04-build/input_line_12:21:
In file included from /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.14.04-0d8dc/x86_64-slc6-gcc62-opt/etc/../etc/dictpch/allHeaders.h:688:
In file included from /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.14.04-0d8dc/x86_64-slc6-gcc62-opt/etc/../include/TTreeAsFlatMatrix.h:17:
In file included from /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.14.04-0d8dc/x86_64-slc6-gcc62-opt/etc/../include/ROOT/RDataFrame.hxx:26:
In file included from /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.14.04-0d8dc/x86_64-slc6-gcc62-opt/etc/../include/ROOT/RDFInterface.hxx:32:
/cvmfs/sft.cern.ch/lcg/releases/ROOT/6.14.04-0d8dc/x86_64-slc6-gcc62-opt/etc/../include/ROOT/RDFInterfaceUtils.hxx:94:4: error: static_assert failed "not implemented for this type"
   static_assert(std::is_base_of<TH1, T>::value, "not implemented for this type");
   ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/cvmfs/sft.cern.ch/lcg/releases/ROOT/6.14.04-0d8dc/x86_64-slc6-gcc62-opt/etc/../include/ROOT/RDFInterfaceUtils.hxx:101:40: note: in instantiation of template class 'ROOT::Internal::RDF::IsV7Hist<Boo>' requested here
template <typename T, bool ISV7HISTO = IsV7Hist<T>::value>
                                       ^
/cvmfs/sft.cern.ch/lcg/releases/ROOT/6.14.04-0d8dc/x86_64-slc6-gcc62-opt/etc/../include/ROOT/RDFInterface.hxx:1167:25: note: in instantiation of default argument for 'HistoUtils<Boo>' required here
      if (!RDFInternal::HistoUtils<T>::HasAxisLimits(*h)) {
                        ^~~~~~~~~~~~~
ROOT_prompt_2:1:10: note: in instantiation of function template specialization 'ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::Fill<Boo>' requested here
auto a=d.Fill(Boo("b","b",100,1000,10000),{"mjj","weight"})
         ^

I tried also with root 6.18.04 with gcc8 on a CENTOS7 machine (from /cvmfs/sft.cern.ch/lcg/releases/LCG_96b/ROOT/6.18.04/x86_64-centos7-gcc8-opt/bin/root): slightly different error but same message.

Just to clarify what I tried: RDataFrame Fill method with a simple TH1F works fine in interactive root session but not in python (same error above).
What am I missing?

Thanks for your support,

Simone

I think @eguiraud can most probably help you

Hi Simone,
welcome to the ROOT forum!

Indeed there are several things going on. The main problem is that you just re-discovered a bug that was supposed to be fixed, ROOT-9737 ("[DF] Cannot use fill if the object is not an histogram"). I re-opened it, and it will be fixed as soon as possible (but note that there are a bunch of critical RDF bugs that need attention first, so it might take a little while).

Then for python usage specifically, another bug that might bite us is ROOT-10396 (“Failure when instantiating RDataFrame::Fill”).

Depending on what you actually need to accomplish, one or the other workaround might appy.
Do you really need a custom Fill even if it is afflicted by the bugs above?

The simplest way to use the custom Fill is to make your Boo class inherit from TH1 which is (wrongly) a requirement for Fill to work. You will also have to override the definitions of TH1’s Fill, Add and Merge and add a copy-constructor for things to work in multi-thread runs (i.e. when ROOT::EnableImplicitMT has been called; each thread will have its copy of Boo).

In a single-thread, in C++, this works:

#include <ROOT/RDataFrame.hxx>
#include <iostream>

class Boo : public TH1 {
   TH1F *hh;

public:
   Boo() { hh = new TH1F("Boo", "Boo", 100, -1000, 1000); }
   Boo(const Boo &b2) { hh = b2.hh; }
   Boo(const char *name, const char *title, int nBins, float xLow, float xHigh)
   {
      hh = new TH1F(name, title, nBins, xLow, xHigh);
   }
   void Fill(float x, float w) { hh->Fill(x, w); }
   TH1F *GetHisto() { return hh; }
};

int main()
{
   ROOT::RDataFrame df_(10);
   auto df = df_.Define("mjj", "10").Define("weight", "1");
   auto filled = df.Fill<int, int>(Boo(), {"mjj", "weight"});
   std::cout << filled->GetHisto()->GetEntries() << std::endl;
   std::cout << filled->GetHisto()->GetMean() << std::endl;

   return 0;
}

Now for python, note that in general you need -fPIC when compiling a shared library that you need to link with ROOT. And to get around ROOT-10396 that I linked above, we can use a C++ helper function.
This works in v6.18, but fails in v6.14 and v6.16 due to PyROOT not playing nice with templates in those older versions

import ROOT
ROOT.gInterpreter.Declare('#include "Boo.h"')
ROOT.gSystem.Load("libBoo.so")

ROOT.gInterpreter.Declare("""
template <typename DF>
ROOT::RDF::RResultPtr<Boo> FillBoo(DF df) {
    return df.Fill(Boo(), {"mjj", "weight"});
}
""")


df = ROOT.RDataFrame(10).Define("mjj", "10").Define("weight", "1");
b = ROOT.FillBoo(df)
print(b.GetHisto().GetEntries())

To summarize, we can work around RDF’s bug by inheriting from TH1, and we can work around pyROOT’s bug by using a C++ helper function (at least since v6.18). Or, depending on your usecase, you can change the rules of the game and use something else than a custom Fill. I’ll try to address at least the RDF bug as soon as possible, but the fix will most probably not be backported to ROOT v6.14, maybe v6.18, definitely v6.20.

Sorry you hit such a nasty corner of ugly things.
Cheers,
Enrico

P.S.
@etejedor or @swunsch might have an idea about how to use something like that C++ helper function also in v6.14.

Template handling in <6.18 PyROOT is quite buggy and incomplete, so if Enrico’s example does not work I would just advise to move to a newer ROOT version.

Yes indeed RDataFrame has also improved quite a lot since v6.14. For example, with recent versions there is no need for the template in

template <typename DF>
ROOT::RDF::RResultPtr<Boo> FillBoo(DF df) {
    return df.Fill(Boo(), {"mjj", "weight"});
}

but you can use the generic ROOT::RDF::RNode type:

ROOT::RDF::RResultPtr<Boo> FillBoo(ROOT::RDF::RNode df) {
    return df.Fill(Boo(), {"mjj", "weight"});
}

And with C++14’s auto return types it’s even simpler:

auto FillBoo(ROOT::RDF::RNode df) {
    return df.Fill(Boo(), {"mjj", "weight"});
}

Ciao a tutti!

First of all, thanks for your precious answers.
It seems pretty clear to me that using a >=6.18 version is the best option (and I have absolutely no problem with this).

Depending on what you actually need to accomplish, one or the other workaround might appy.
Do you really need a custom Fill even if it is afflicted by the bugs above?

I can actually somehow avoid the custom Fill (I have already something working without it) but I was trying to use it to improve my code (also in terms of readability).
No problem with having a class inherited from a TH1 for now: I understand what you mean that this is a wrong requirement but for now what you suggested is a good workaround (for both c++ and python).

Just to understand better about the other requirements for multi-thread runs: are there any additional needs for the additional Add and Merge methods? I included only the Fill method and the copy constructor because the others were not mentioned here

Thank you all again for your great support and clarifications.
Cheers,

Simone

Yes the docs are broken. You also need a copy-constructor, that should be all. You’ll definitely notice if you are missing something.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.