2D Reconstruction Efficiency - Issues with TEfficiency

BLunday · January 22, 2019, 8:39pm

Greetings, all.

I’m trying to put together a 2D efficiency plot, and as such have generated both a MC and reconstructed event histogram which I draw using the ‘colz’ option. On the surface, the two look similar as is expected:

However, when I pass these histograms into the TEfficiency object with the current range, CheckConsistency throws an error and tells me the two do not have consistent bin contents. After checking the bin and range settings between histograms, I then tried expanding the range and was surprised when TEfficiency spat out a result at a higher range setting:

Though promising, I would definitely prefer the efficiency graph at a higher resolution. The fact I get a result at such a low resolution suggests to me this is an issue with how the histogram is drawn when the range is set at 1 – the only notable discrepancy between the two histos at this point is these supposedly ‘empty’ bins colored by colz on the reconstructed graph:

I’ve tried placing cuts to exclude values > 0.9 on both axes in an attempt to cut out any events that could be causing issues, but am still running into the same CheckConsistency error. Have I made a basic mistake somewhere, or is there a known issue drawing TEfficiency histos with the colz option? Any help would be much appreciated.

couet · January 23, 2019, 8:15am

The plotting option you are using should not be the problem in that case. It seems more
an issue with TEfficiency. May be @moneta can help you with this matter.

moneta · January 23, 2019, 9:25am

I don’t understand the issue. If you get a CheckConsistency error, it means there is an issue with different axis and bins in the histograms, and the has nothing to do with the drawing.

Lorenzo

kialbert · January 23, 2019, 3:41pm

Hi,

Maybe this is helpful. As mentioned in the reference manual:

If you already have two histograms filled with the number of passed and total events, you will use the constructor TEfficiency(const TH1& passed,const TH1& total) to construct the TEfficiency object. The histograms “passed” and “total” have to fulfill the conditions mentioned in TEfficiency::CheckConsistency, otherwise the construction will fail.

Clearly you are encountering problems with CheckConsistency. The documentation for the CheckConsistency function stats these as requirements for a successful creation of a TEfficiency

both hists have the same dimension (1d, 2d, 3d)
both hists have the same binning
pass.GetBinContent(i) <= total.GetBinContent(i) for each bin i

Speculating slightly here, it could be that you are hit by the last condition (e.g. you pass your histograms as TEfficiency(montecarlo, reco)), apparently TEfficiency does not support a bin efficiency above 1.0.

Another option is as @moneta is saying, there is a binning mismatch. Could you post the histograms range and number of bins maybe?

Cheers,
Kim

BLunday · January 24, 2019, 9:03pm

All,

Thank you for the replies. @kialbert is correct that the bin content condition is what’s causing CheckConsistency to throw the error – Apologies, this should have been mentioned in my initial post.

As the reconstructed events are a subset of the monte carlo data, the efficiency should logically be below 1.0 across the board. This error first led me to believe I was pulling the values incorrectly from the tuple. However, this wouldn’t explain why the efficiency histogram is drawing for higher range values – if there’s a bug in my code causing an efficiency greater than 1 at a low range, I would think this error would also cause CheckConsistency to throw an error at a higher range as well.

As a sanity check, I rechecked my bin contents to ensure consistency between the three histograms. All three are defined from the same set of variables, and all three constructors have the correct variable ordering.

kialbert · January 25, 2019, 12:23pm

Could it be that the histograms are rebinned with the higher range and that the results are then consistent?

If we imagine an infinitely fine histogram binning there would always be discrepancies between the histograms (with the efficiency being either 0 or inf). In the other direction, if there is only one bin one just compares the total between the two.

Would it make sense to loop through each bin and cap the pe_mc data to not be larger than the pe_gen?

Cheers,
Kim