RDataFrame Histo1D crashes with weights for certain columns


ROOT Version: 6.14/06
Platform: Fedora 28
Compiler: gcc


This is further to my previous topic:

which I posted when this problem initially appeared, but didn’t have time to deal with debugging it properly. Apologies, I did not really give any useful information to aid with solving the problem, but now I’m committed to fixing it.

The details given there for the nature of the crash and the error message are still relevant, however I have managed to recreate the issue in a much smaller example.

The basics are:
For a data frame variable d:

d.Histo1D("Column_Name")->DrawCopy();

works for every column in the file. However,

d.Histo1D("Column_Name","Weight")->DrawCopy();

only works for columns which have a single value per event.
For example, I have a column "Muon_n", which contains the number of muons detected in the event. This works with the weights in Histo1D. I also have a column "Muon_Pt", which contains an array of the transverse momentum for each of the muons in the event, and hence can have multiple values.

This pattern of only crashing for arrays is consistent for the many columns I have tested, but as far as I can tell the action should still be valid for these columns?

Hi,
what’s the error message or stacktrace that you get with the crash?

Note that “Column_Name” and “Weight” must be either both scalar or both arrays of the same length.

Hi,
The compiler message is in reference to the Exec method, it’s given in the link above.

That will be my issue though, my weights are scalar and it crashes when columns are arrays.

The weights I’m using are on a per event basis, and I would have imagined this was one of the most common uses for weights. Would it not make sense to have an implementation where a scalar weight is applied for each element of a vector column? Otherwise I would have to define a large number of weight columns with arrays the same size as various other columns.

Hi Harry,

thanks for taking the time to post this. This is a missing feature in RDF. I acknowledge that we should be able to histogram arrays weighted by scalars, e.g. in the case of 1 weight per event. I’ll open a JIRA item for this and link it on this thread.

Now, something to unblock you. We can make your weight an array and make RDF work happily.

// Mimick something which looks like the DF discussed above
// The content of the branches is garbage, but the types are right :)
float w=2.f;
std::vector<float> pts;

ROOT::RDataFrame df0(4);
auto df1 = df0.Define("Weight", [&w]() { return w++; })
              .Define("Muon_pts", [&pts, &w]() {pts.emplace_back(w++);return pts; });

// Transform the single weight in multiple weights
auto weight2weights = [](float w, const std::vector<float>& pts){
    return ROOT::RVec<float>(pts.size(), w);
    };
auto df = df1.Define("Weights", weight2weights, {"Weight", "Muon_pts"});

// Finally create the histo
auto h = df.Histo1D("Muon_pts", "Weights");

h->DrawCopy();

Thank you very much, this is excellent :slight_smile:

Hi,
the feature request is now tracked at https://sft.its.cern.ch/jira/browse/ROOT-9985 in case you want to follow its progress (or ping us in case there is no progress at all for a few weeks).

Cheers,
Enrico

Hi Harry,

the item has been closed and the feature is now part of the master and 6.16 branches and it will be part of the 6.18 release as well as the 6.16/02 patch release.
https://sft.its.cern.ch/jira/browse/ROOT-9985
Thanks again for reporting this missing feature.

Cheers,
D

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.