Problem with merge_hadd.C or GetEffectiveEntries()

amayne · May 6, 2011, 9:34am

Hello,

I’m having a problem with the number of entries in my plots after the histograms have been scaled.

I’m using the script merge_hadd.C (available here: cms.pd.infn.it/software/meetings … rge_hadd.C) to combine the root files I’ve produced for different MC samples with different cross sections.

I’m looking at the number of event passing a series of increasingly tight cuts. If I look at the number of entries (->GetEntries()) in each of the histograms after combining them with this script everything looks fine:

New Info: mcMETmono Entries = 224854
New Info: mcMETmonoNtrk Entries = 224669
New Info: mcMETmonoNGd1trk Entries = 223593
New Info: mcMETmonoNGd2trk Entries = 119362

As expected, the number of events decreases with each cut.
If I look at the effective entries things look bad:

New Info: mcMETmono Effective entries = 14485.8
New Info: mcMETmonoNtrk Effective entries = 14612.6
New Info: mcMETmonoNGd1trk Effective entries = 14686.7
New Info: mcMETmonoNGd2trk Effective entries = 7943.35

the number of events seem to increase. I’m very puzzled as to what may be causing this, so any suggestions would be greatly appreciated.

Many thanks,

Anna

pcanal · May 6, 2011, 12:42pm

[quote]If I look at the effective entries things look bad:[/quote]What is the ‘effective entries’? What is the difference with GetEntries()?

Philippe.

amayne · May 6, 2011, 1:00pm

Hi Philippe,

Sorry for not being more clear. Using histo->GetEntries() on four different plots I get:

New Info: mcMETmono Entries = 224854
New Info: mcMETmonoNtrk Entries = 224669
New Info: mcMETmonoNGd1trk Entries = 223593
New Info: mcMETmonoNGd2trk Entries = 119362

Which make sense as the entries decrease with the tighter cuts. Using histo->GetEffectiveEntries() on the same four plots I get:

New Info: mcMETmono Effective entries = 14485.8
New Info: mcMETmonoNtrk Effective entries = 14612.6
New Info: mcMETmonoNGd1trk Effective entries = 14686.7
New Info: mcMETmonoNGd2trk Effective entries = 7943.35

Which no longer makes any sense to me.

Thanks for the help,

Anna

moneta · May 6, 2011, 2:23pm

Hi,

Can you please post an example we can easily run together with the root file containing the histogram ?
Otherwise it is difficult to find the reason of your problem.

Concerning the previous question :
effective entries = (sum of weights )^2 / (sum of square of weights)

Best Regards

Lorenzo

amayne · May 6, 2011, 3:01pm

Hi Lorenzo,

Please find attached a test script that prints out the histo entries/effective entries.
The root file containing the scaled and merged MC samples is too big to attach so I’ve copied it to /afs/cern.ch/user/a/amayne/public/TotalMC_lumi304.root

Many thanks,

Anna
DataMCPlotterTEST.C (1.83 KB)

moneta · May 6, 2011, 8:09pm

Hi Anna,

Thank you for the files. I see from the histograms that all the errors do not make any sense. You have probably missed to call TH1::Sumw2(), since your histogram have entries which do not represent anymore Poisson counts.

If you still see this problem, after having added the call to Sumw2() in your script before merging, I would need also the original histogram to understand it better
Cheers,
Lorenzo

amayne · May 10, 2011, 1:09pm

Hi Lorenzo,

Thanks for the tip. Unfortunately the problem persists. Could you possibly attach a script that demonstrates the scaling and merging of two histograms where ->GetEntries() and ->GetEffectiveEntries() return sensible values?
This would be a great help in getting to the bottom of the problem.

Many thanks,

Anna

moneta · May 10, 2011, 1:18pm

Hi,
Which ROOT version are you using ? If you are using the latest version, 5.29.02 or 5.28.00-patches, please send me the original histograms and the merged ones in order to find the problem.

Lorenzo

amayne · May 10, 2011, 2:13pm

Hi Lorenzo,

I have tried versions 5.22 and 5.26. I’ll try 5.28.00b and let you know if anything changes.
I’ve copied the relevant scripts to /afs/cern.ch/user/a/amayne/public/RootProb
Unfortunately the root files themselves are too big for me to copy here as I have very little disk space on lxplus.
The files I have been using are listed in the scripts. Would it be possible for you to to dq2-get them?
The main offenders are the QCD multijet samples, but event the ttbar (5200 and 5204) samples don’t appear to be returning the correct values with GetEffectiveEntries after scaling and merging.

Thanks for your help,

Anna

moneta · May 10, 2011, 2:27pm

Hi,
Can you just send me just two histograms at a certain cut and a merged one ? It will be easier to reproduce the problem.

Thanks, Lorenzo

amayne · May 10, 2011, 2:57pm

Hi Lorenzo,

I’ve attached three plots. One for each of the two top samples prior to merging and the merged plot.
The number of entries in each histogram and the scaling looks fine. However, when I use GetEffectiveEntries the number returned for the merged plot is 3330.87. The scales applied to each top sample are less than 0.001 so this number is way off.

Thanks,

Anna

moneta · May 10, 2011, 3:14pm

Hi,

Thank you for the plots. Why do you say the value is far off ? The effective entries depends on the histogram errors, are these correct or not ?

Lorenzo

amayne · May 10, 2011, 3:46pm

Perhaps I’m misunderstanding the function GetEffectiveEntries.
For a histogram with 100 entries that is scaled by 0.1 I would expect:
histo->GetEntries() = 100
histo->Scale(0.1)
histo->GetEffectiveEntries() = 10

or is scaling by hand (new entries = 0.1 * histo->GetEntries() ) the only way of getting this value?

Many thanks,

Anna

moneta · May 10, 2011, 7:39pm

Hi Anna,

the number of effective entries is invariant with the scaling, since both content and errors scale by the same amount.
Rethinking on your problem, the number of effective entries for merged histogram could increase in some cases after the selection. This can happen when your selection criteria selects more your MC sample with higher statistics.

Lorenzo

amayne · May 11, 2011, 8:37am

Hi Lorenzo,

I see, so everything is working as it should.

Thank you for all your help,

Anna

moneta · May 11, 2011, 9:05am

Hi,

There could be some problems in case of low statistics. When you merge the histograms, the bin errors are assumed to be gaussian and this is not the case for bins with few entries. So, if you are interested in the errors in each bin for the merged histogram, and if the bin is the result by adding various sources, where some of them have 0 content (and the expected value is not zero), the computed merged bin error will be under-estimated.

Lorenzo