Increase accuracy of leaves when using RDataFrame.Mean

Dear ROOT-ers,

I’m reading a ROOT tree with Float_t leaves. I use RDataFrame for that:

dr_th_no_cut = df.Mean["double"]("displacement.dr_proj_th_cm")

My idea is that I don’t need large precision for each leaf, thus saving space. But to calculate the mean, I would like to increase the precision of the result to avoid numerical errors. Unfortunately I get an error with that code (see the first line):

Error in TTreeReaderValueBase::CreateProxy(): Leaf of type Float_t cannot be read by TTreeReaderValue.
Traceback (most recent call last):
File “…/read_phys_file.py”, line 46, in
print(“no cut, cuts:”, dr_th_no_cut.GetValue(), dr_th_wcuts.GetValue())
~~~~~~~~~~~~~~~~~~~~~^^
cppyy.gbl.std.runtime_error: const double& ROOT::RDF::RResultPtr::GetValue() =>
runtime_error: An error was encountered while processing the data. TTreeReader status code is: 6

Can RDataFrame support increasing accuracy for such aggregation operations? Is there a good workaround to solve that?

I’m using ROOT 6.30 at the moment. Thank you.

Maybe with Redefine (or Define, for new extra columns), but the result seems to be the same, so maybe it doesn’t matter if it’s float originally, ROOT will use double precision (or it’s not really converting to double? an RDataFrame expert may clarify)?

With this:

import ROOT

d = ROOT.RDataFrame("ntuple","hsimple.root")
print(d.Describe())
print('Mean px:',d.Mean("px").GetValue())
print('Mean py:',d.Mean("py").GetValue())

d2 = d.Redefine("px","(double)px").Redefine("py","(double)py")
print(d2.Describe())
print('Mean px:',d2.Mean("px").GetValue())
print('Mean py:',d2.Mean("py").GetValue())

I get

...
Column  Type    Origin
------  ----    ------
i       Float_t Dataset
px      Float_t Dataset
py      Float_t Dataset
pz      Float_t Dataset
random  Float_t Dataset

Mean px: -0.0038264499006807457
Mean py: -0.0032243128226821954

...
Column  Type    Origin
------  ----    ------
i       Float_t Dataset
px      double  Define
py      double  Define
pz      Float_t Dataset
random  Float_t Dataset

Mean px: -0.0038264499006807457
Mean py: -0.0032243128226821954
1 Like

Perhaps @vpadulan can help here