Different results between TTree::Project and RDataFrame::Histo1D

pad · December 7, 2022, 11:17am

Dear everyone,

I’m seeing different results when filling data from a same TTree using 2 different methods.

# get the file (69MB) from https://cernbox.cern.ch/s/aotAznyWsQVSqxa
f = ROOT.TFile.Open("jetE.root")
tr = f.jtree

hr = ROOT.TH1D("hr","hr", 100, 0,2.)
tr.Project("hr","jet_E/jet_true_E")
print(hr.GetBinContent(44))

which gives 1187233.0

When using RDataFrame :

df = ROOT.RDataFrame(tr)
df=df.Define('r', "jet_E/jet_true_E")
h = df.Histo1D(("hr","hr", 100, 0,2.), 'r').GetPtr()
print(h.GetBinContent(44)) # this gives 1187240.0

I get 1187240.0

Am I doing something wrong ? Or is there something wrong with Project or Histo1D ?

Cheers,
P-A

ps : editing to add I also tried an entirely different method : by loading the data with uproot and then calling hr.FillN( ), I get the same result as in the 2nd case (RDataFrame case)

_ROOT Version:6.26/02
_Platform: linux
Compiler: Not Provided

eguiraud · December 7, 2022, 11:32am

Hi @pad ,

Thank you for providing a full reproduce (including input data).
I’m not sure what could cause the difference, we need @moneta 's help. Let’s ping him

Cheers,
Enrico

Wile_E_Coyote · December 7, 2022, 11:41am

My suspicion is …
Both “jet_E” and “jet_true_E” are “Float_t”.
The exact value of “jet_E/jet_true_E” will depend on when something gets promoted to “Double_t” (so, it’s possible that sometimes the calculated value will land in the neighboring bin):
df=df.Define('r', "Double_t(jet_E)/Double_t(jet_true_E)")

pad · December 7, 2022, 1:46pm

Hello,

Thanks ! You mean that Project would convert to double before taking the ratio while RDataFrame is performing it using float ? And then the 7 wrong assignments would be due to the resulting numerical fluctuation around the bin boundaries…
That makes sense, but is 7 out of ~1M compatible with the difference in numerical precision between float and double ?

Cheers,
P-A

moneta · December 7, 2022, 2:42pm

Hi,
Yes this is correct, I can confirm is indeed a numerical precision issue. You can get the two results with this simple code:

{
   auto f = TFile::Open("jetE.root");
   auto tr = (TTree *) f->Get("jtree");
   TBranch *b1 = tr->GetBranch("jet_E");
   TBranch *b2 = tr->GetBranch("jet_true_E");
   float x1 = 0;
   float x2 = 0;
   b1->SetAddress(&x1);
   b2->SetAddress(&x2);
   size_t nentries = tr->GetEntries();
   auto hr1 = new TH1D("hr1","hr1", 100, 0,2.);
   auto hr2 = new TH1D("hr2","hr2", 100, 0,2.);
   for (size_t i=0;i<nentries;i++) {
      tr->GetEntry(i);
      hr1->Fill(x1/x2);
      double r = double(x1)/double(x2);
      hr2->Fill(r);
   }
   cout << size_t(hr1->GetBinContent(44)) << endl;
   cout << size_t(hr2->GetBinContent(44)) << endl;
}

which gives as result:

1187240
1187233

Lorenzo

pad · December 7, 2022, 2:56pm

Thanks for the confirmation !

system · December 21, 2022, 2:57pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.