Home | News | Documentation | Download

RDataFrame Histo1D crashes with weights for certain columns

rdataframe

#1

ROOT Version: 6.14/06
Platform: Fedora 28
Compiler: gcc


This is further to my previous topic:

which I posted when this problem initially appeared, but didn’t have time to deal with debugging it properly. Apologies, I did not really give any useful information to aid with solving the problem, but now I’m committed to fixing it.

The details given there for the nature of the crash and the error message are still relevant, however I have managed to recreate the issue in a much smaller example.

The basics are:
For a data frame variable d:

d.Histo1D("Column_Name")->DrawCopy();

works for every column in the file. However,

d.Histo1D("Column_Name","Weight")->DrawCopy();

only works for columns which have a single value per event.
For example, I have a column "Muon_n", which contains the number of muons detected in the event. This works with the weights in Histo1D. I also have a column "Muon_Pt", which contains an array of the transverse momentum for each of the muons in the event, and hence can have multiple values.

This pattern of only crashing for arrays is consistent for the many columns I have tested, but as far as I can tell the action should still be valid for these columns?


#2

Hi,
what’s the error message or stacktrace that you get with the crash?

Note that “Column_Name” and “Weight” must be either both scalar or both arrays of the same length.


#3

Hi,
The compiler message is in reference to the Exec method, it’s given in the link above.

That will be my issue though, my weights are scalar and it crashes when columns are arrays.

The weights I’m using are on a per event basis, and I would have imagined this was one of the most common uses for weights. Would it not make sense to have an implementation where a scalar weight is applied for each element of a vector column? Otherwise I would have to define a large number of weight columns with arrays the same size as various other columns.


#4

Hi Harry,

thanks for taking the time to post this. This is a missing feature in RDF. I acknowledge that we should be able to histogram arrays weighted by scalars, e.g. in the case of 1 weight per event. I’ll open a JIRA item for this and link it on this thread.

Now, something to unblock you. We can make your weight an array and make RDF work happily.

// Mimick something which looks like the DF discussed above
// The content of the branches is garbage, but the types are right :)
float w=2.f;
std::vector<float> pts;

ROOT::RDataFrame df0(4);
auto df1 = df0.Define("Weight", [&w]() { return w++; })
              .Define("Muon_pts", [&pts, &w]() {pts.emplace_back(w++);return pts; });

// Transform the single weight in multiple weights
auto weight2weights = [](float w, const std::vector<float>& pts){
    return ROOT::RVec<float>(pts.size(), w);
    };
auto df = df1.Define("Weights", weight2weights, {"Weight", "Muon_pts"});

// Finally create the histo
auto h = df.Histo1D("Muon_pts", "Weights");

h->DrawCopy();