Loading a histogram in RDataFrame

Hi,

I want define a new RDataFrame column using the values (bin contents) from a TH1 I have loaded from a .root file.
The problem is similar to the one described here

but I don’t really understand the syntax used there and the solution does not work for me. I was thinking in something like

ROOT::RDataFrame d("myHisto", "myFile.root");
d.Define("h_bin", "myHisto->GetBinContent(15)");

However, this obviously does not work because RDataFrame expect to have a tree as input.

Would it even be possible to store the contents of the bins in a vector? something like

.Define("a_vector", [](float a) {                                                                                                                                                        
                    std::vector<float> vec;                                                                                                                                                                  
                    for (int i = 0; i < myHisto->GetNbinsX(); i++)     
                        a = myHisto->GetBinContent(i);                                                                                                                              
                      vec.push_back(a);                                                                                                                                                                 
                    return vec;                                                                                                                                                                              
                  },                       
                                                                                                                                                                
                  );     

Thanks,
Jordi


_ROOT Version: 6.19/01
Platform: Not Provided
Compiler: Not Provided


Hi Jordi,
the post you link contains a solution to your question, with a code snippet. What exactly is not clear?

You can also use a lambda as in your second snippet – you just have to capture myHisto:

TH1D *myHisto = ....;
.Define("a_vector",
        [myHisto](float a) {
           const auto nBins = myHisto->GetNbinsX();                                                 
           std::vector<float> vec(nBins); // always pre-allocate when you can                                                                   
           for (int i = 0; i < nBins; ++i)
              vec.push_back(myHisto->GetBinContent(i));                                                                                                                             
           return vec;
       });

Also note that it might be more efficient to just use the array contained in myHisto directly (you can retrieve it with GetArray) rather than copy its contents to a std::vector at every entry.

Cheers,
Enrico

Hi @Enrico,

referring to the post I linked, it is the way the histogram is loaded when using the dynamic scope.

//opening a file with a histogram called 'histo' inside
TFile myFile("file.root")
auto myHist = "histo"; //or TH1F myHist = myFile.Get("histo");
df = ROOT.ROOT.RDataFrame(3)
df2 = df.Define("hbin", "histo->GetBinContent(1)")

I get

error: use of undeclared identifier 'histo'

my question is basically how to pass the TH1F to the .Define, which I have not been able to do yet.

Hi,
the code in the post you linked has two extra important lines:

f = ROOT.TFile("hsimple.root")
ROOT.gInterpreter.ProcessLine("auto histo = hpx;")

The ProcessLine call creates a histo variable in the scope of the ROOT interpreter, that you can use from RDF string expressions like in df2 = df.Define("hbin", "histo->GetBinContent(1)") (here histo will have the value set with ProcessLine).

You don’t have to go through the interpreter though: you can define myHist as a TH1F variable and capture it in a lambda that you pass to Define, as I show in the snippet above (which I haven’t tested, but it should give you an idea).

So, to summarize, you can:

  • declare your histo variable to the interpreter via ProcessLine and then use it in RDF string expressions
  • declare a normal C++ histo variable in your program and capture it in the lambdas that you pass to Defines and Filters
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.