TEfficiency, force histograms to pass TEfficiency::CheckConsistency ()

yassine · May 31, 2021, 11:06am

Hi,

I would like to compute Efficiency using TEfficiency, the problem is my input histograms didn’t pass TEfficiency::CheckConsistency() criteria turns out that in some bins pass.GetBinContent(i) <= total.GetBinContent(i).
The only thought I have to overcome this, is by taking the two histograms and rebin them and force the pass histograms to have a bin content in each bin smaller than the total one.

I was wondering if it’s the appropriate (clever) way to do it. Otherwise, I’d be definitely happy to hear your suggestion.

Thank a lot

yassine · June 2, 2021, 8:43am

Hey @Rooters, would you please provide some guidance, and sorry if my question seems silly.

Thank you so much for any advice.

yus · June 2, 2021, 9:58am

Hi,

How do you expect the efficiency to be calculated in this case then?

I personally don’t think it’ll work. The rebinned histogram will have the exact same issue.

Sorry I don’t have a solution for you.

oshadura · June 2, 2021, 2:07pm

@couet maybe you have any suggestions here? Thank you in advance!

couet · June 2, 2021, 5:39pm

May be @moneta has some suggestions. Also a small macro reproducing the problem might be useful.

moneta · June 4, 2021, 6:40am

Hi,

If you have two histogram with independent Poisson counts and you want to make the ratio, i.e. in the case that it is not binomial, I remind you that you can use TGraphAsymmErrors::Divide with the option pots. See ROOT: TGraphAsymmErrors Class Reference.
Otherwise you cannot use TEfficiency if ntotal < npass .

Best regards

Lorenzo

yassine · June 7, 2021, 4:31pm

Hi Lorenzo,

Thanks a lot for your suggestion, It worked, but efficiency plots have values > 100% (It makes sense since h_pass > h_tot).
But from a physics point of view, I think it doesn’t make sense to have this. Right ?

I would be grateful if you could propose an alternative solution ?

Best,

yassine · June 7, 2021, 4:33pm

Hi Couet,

The histograms are filled from our analysis framework (inherit form CxAODFramework it is for ATLAS only …)

Thanks

moneta · June 7, 2021, 5:19pm

Hi,

I don’t know what the 2 histogram represents, so I cannot say anything about what make sense or not. For sure if one histogram represents a subset of the count of the other (i.e. this is in the efficiency case) then h_pass is < h_tot.

Lorenzo

yassine · June 7, 2021, 5:27pm

we are trying to estimate b-tagging efficiency. For that we define efficiency as N(b-tagged reco jets)/N(b-tagged truth jets).

h_pass->GetEntries() < h_tot->GetEntries()

But the problem that in some bins the content of h_pass is grater that h_tot

Thanks again!

yus · June 7, 2021, 8:53pm

How come? The number of reconstructed jets should always be smaller ot equal to the number of generated jets. If you generate two jets in event and reconstruct three, what is this third one? Pile-up? If so, I’d suggest to tighten your selection criteria and not to account for such jets. Otherwise h_pass does not hold a subset of h_tot, so you can’t use the TEfficiency class.

moneta · June 8, 2021, 7:35am

Hi,

This is possible if there is the number of reconstructed jets contained a contribution from fake jets.
The process is then more complicated is not a binomial one and neither two independent Poisson process, because there are correlations between and reco and truth jets, I think. You would need some MC simulation or some other techniques to estimate the uncertainty in their ratio.
Or as suggested above tight the selection to make the contribution from additional jets negligible).

However I am not sure how this quantity is useful for you. Normally one looks at the b-tagging efficiency defined as: (number of b tagged jets) / (number of true b jets ) , which seems not what you are doing

Lorenzo

yassine · June 8, 2021, 9:41am

The total number of entries in reco hist is less than the total entries in the true hist.
the problem as I said is in some bin the content (h_reco->GetBinContent(1) > h_true->GetBinContent(1)).

if you believe that the in all bins ( h_reco->BinContent() should be less than h_true->GetBinContent()) then the problem is in my code and I believe that the reason is that the reco is not a subset of true.

I mean I had to do something like that

for (int i=0; i < truth; i++){
      if (truth_isbjet) h_true->Fill()
      for (int j=0; j<reco; j++){   
           if(reco_isbjet) h_reco->Fill()
      }
}

moneta · June 8, 2021, 12:47pm

Your code is not clarifying the issue:

Is your code run for every event ? Is truth the number of true jets in the event ?
Are you filling the histogram with what ? You need to pass a double variable in h->Fill
Why are you looping on reco for every i ? What is exactly reco ?

yassine · June 8, 2021, 1:03pm

The code above is not what I am using, it was only a thought to overcome this.

what I was doing is looping separately on the truth and reco jets separately (am interested on V->qq (V=WorZ) )

First I retrieve truth information from the dedicated container and choose the leading 2 jets (with max sum pT). then I constructed Dijet system with TLorentzVector (PtEtaPhiM), in order to have b-truth jets I used a variable in our CxAODs called HadronConeExclTruthLabelID which should equal to 5 for b-tagged jet.
I fill information in histograms pT eta, phi and mass of dijet.

like that

h_truth->Fill(dijet.Pt())

For the reco I open a separate loop where I took the reconstructed jets from my reconstructed object (which coontain a lepton, neutrino and 2 jets) and require them to be b-tagged. Then I fill the information as the truth ones.

Please let me know if you need further info.
Thanks

moneta · June 8, 2021, 2:53pm

Hi,
Thanks for the clarification. It seems to me than you are filling the histogram with jet results (for truth and reco jets) with many events. As first step then you could assume that the truth and the reco histogram bins are un-correlated, and therefore you could use the Poisson ratio case instead of TEfficiency. At least try this as first approximation, still better than using TEfficiency for this case.

Lorenzo

yassine · June 8, 2021, 8:37pm

Hi Lorenzo,

Thank you so much for your suggestion.
If you are refereeing TGraphAsymmErrors::Divide I already tried it with option pois
as you proposed earlier but seems to give some strange results
eff = N(reco b-jets)/ N(truth b-jets)

I attached a plot so you can have a look (the efficiency here is computed using TGraphAsymmErrors::Divide )

Thanks again

moneta · June 9, 2021, 7:10am

Hi,
The large error you are getting in the first and last bins are due to a very low statistics, so I think there is noting wrong with this. I think there is nothing wrong with the plot, but maybe, as I mentioned before, I am not sure about the utility of the ratio you are computing

Cheers

Lorenzo

system · June 23, 2021, 7:10am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.