Bin content discrepancy when using hadd

I’m seeing a discrepancy between the bin content in individual root files and those produced using hadd from those same files. I’ll first describe in words what I see and then include some output below.

I have 20 root files indexed 20-39. If I hadd all files together and look at bin 1 of a histogram, I get 28230600 entries.

If I hadd all files but the last one together, i.e. 20-38, and look at that same bin, I get 27046676. And if I just look at the last file (39), that bin has a content of 1183923. Adding these two together, you get 28230599—one less than using hadd on all the files!

How is it possible that these are different?

I’m using ROOT version 6.14/09.

$ hadd -f test20to39.root $(ls | grep -E "QCD2018-(2[0-9]|3[0-9]).root" | tr '\n' ' ')
hadd Target file: test20to39.root
hadd compression setting for all ouput: 1
hadd Source file 1: QCD2018-20.root
hadd Source file 2: QCD2018-21.root
hadd Source file 3: QCD2018-22.root
hadd Source file 4: QCD2018-23.root
hadd Source file 5: QCD2018-24.root
hadd Source file 6: QCD2018-25.root
hadd Source file 7: QCD2018-26.root
hadd Source file 8: QCD2018-27.root
hadd Source file 9: QCD2018-28.root
hadd Source file 10: QCD2018-29.root
hadd Source file 11: QCD2018-30.root
hadd Source file 12: QCD2018-31.root
hadd Source file 13: QCD2018-32.root
hadd Source file 14: QCD2018-33.root
hadd Source file 15: QCD2018-34.root
hadd Source file 16: QCD2018-35.root
hadd Source file 17: QCD2018-36.root
hadd Source file 18: QCD2018-37.root
hadd Source file 19: QCD2018-38.root
hadd Source file 20: QCD2018-39.root
hadd Target path: test20to39.root:/
hadd Target path: test20to39.root:/plots
$ hadd -f test20to38.root $(ls | grep -E "QCD2018-(2[0-9]|3[0-8]).root" | tr '\n' ' ')
hadd Target file: test20to38.root
hadd compression setting for all ouput: 1
hadd Source file 1: QCD2018-20.root
hadd Source file 2: QCD2018-21.root
hadd Source file 3: QCD2018-22.root
hadd Source file 4: QCD2018-23.root
hadd Source file 5: QCD2018-24.root
hadd Source file 6: QCD2018-25.root
hadd Source file 7: QCD2018-26.root
hadd Source file 8: QCD2018-27.root
hadd Source file 9: QCD2018-28.root
hadd Source file 10: QCD2018-29.root
hadd Source file 11: QCD2018-30.root
hadd Source file 12: QCD2018-31.root
hadd Source file 13: QCD2018-32.root
hadd Source file 14: QCD2018-33.root
hadd Source file 15: QCD2018-34.root
hadd Source file 16: QCD2018-35.root
hadd Source file 17: QCD2018-36.root
hadd Source file 18: QCD2018-37.root
hadd Source file 19: QCD2018-38.root
hadd Target path: test20to38.root:/
hadd Target path: test20to38.root:/plots
$ root test20to39.root
Loading FW Lite setup.
root [0]
Attaching file test20to39.root as _file0...
(TFile *) 0x39797d0
root [1] ((TH1F*)plots->Get("nEventsPostPre"))->GetBinContent(1)
(double) 28230600.
$ root test20to38.root
Loading FW Lite setup.
root [0]
Attaching file test20to38.root as _file0...
(TFile *) 0x4d01d90
root [1] ((TH1F*)plots->Get("nEventsPostPre"))->GetBinContent(1)
(double) 27046676.
$ root QCD2018-39.root
Loading FW Lite setup.
root [0]
Attaching file QCD2018-39.root as _file0...
(TFile *) 0x563fd90
root [1] ((TH1F*)plots->Get("nEventsPostPre"))->GetBinContent(1)
(double) 1183923.0
printf("TH1F ... float ... %1.11g\n", (27046676.f + 1183923.f)); // 27046676 > 16777216
printf("TH1D ... double ... %1.11g\n", (27046676. + 1183923.));

Thanks, @Wile_E_Coyote. Some additional explanation for others who may come upon this later is below.

The issue is that when using hadd with multiple files, the entries in this bin exceed the maximum allowed in a bin for a TH1F, which stores counts as float. Switching to TH1D increases the precision, and the issue was resolved.

Moral of the story: make sure that the entries in your histograms do not exceed the precision allocated to them by the histogram type. In general, when things are finely binned, it’s usually OK to use TH1F1 since the entries in any one bin will likely not exceed the maximum (16777216). However, TH1D is a safer option since it has a much higher precision. The downside is that it requires more space. In this case, I had one single bin for counting certain events, and the space required to specify these counts quickly exceeded the maximum for a float, producing the strange results above.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.