Way to filter individual columns and return signficance using sig/sqrt(bkg)

Hi all,

I’m working on creating a significance vs energy plot and need help counting events in specific categories from a dataset. Here’s the workflow I’m aiming for:

Count the number of events that meet specific selection criteria for each column (not just return booleans).
Define:
    Signal: electron CC events
    Background: muon and electron NC events
Return the total number of events for signal and background, so I can compute a column for significance.

The ultimate goal is to plot a TGraph with significance on the y-axis and energy on the x-axis.

However, I’m struggling to define signal and background without creating separate DataFrames and applying filters for each. Below is a sample of my code for context. Any advice or suggestions would be greatly appreciated!

df=df.Define("Electron_CC","Any(truth_pdg == 12 || truth_pdg == -12) && Any(truth_pdg==11 || truth_pdg==-11))")
df=df.Define("Electron_NC","Any(truth_pdg != 12 || truth_pdg != -12) && Any(truth_pdg==11 || truth_pdg==-11))")
 df=df.Define("Muon","Any(truth_pdg == 14 || truth_pdg == -14) && Any(truth_pdg==13 || truth_pdg==-13)"
 df=df.Define("Electron_signal","Electron_CC")
df=df.Define("Background","Electron_NC && Muon")
df=df.Define("Significance","Electron_signal/sqrt(Background)")
graph=df.Graph("Calo_total_E_EM","Significance")
graph.Draw()

Please read tips for efficient and successful posting and posting code

Please fill also the fields below. Note that root -b -q will tell you this info, and starting from 6.28/06 upwards, you can call .forum bug from the ROOT prompt to pre-populate a topic.

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided


Not really sure to understand what you’re trying to do. Maybe @vpadulan can help

I’m trying to define a column that would return the number of electrons etc. in an event e.g. the size of electron_CC. I’m basically trying to write the definition that is effectively the same as Any(
truth_pdg == 12 || truth_pdg == -12) && Any(truth_pdg==11 || truth_pdg==-11)).size(), if that makes sense?

Well I don’t see anything obviously wrong…

I think this may explain it better , but I still think there may be an issue in the logic … it runs but not sure it is giving a correct output :

 df=df.Define("Electron_neutrino","(truth_pdg == 12 || truth_pdg==-12)")

 df=df.Define("CC_elecNeutrino","(truth_pdg ==11 || truth_pdg ==-11)")
 
 df=df.Define("NC_elecNeutrino","(truth_pdg !=11 && truth_pdg !=-11)")
 
 df=df.Define("Muon_neutrino","(truth_pdg==14 || truth_pdg ==-14)")
 
 df=df.Define("Elec_Nu_CC","(truth_pdg[CC_elecNeutrino].size() && truth_pdg[Electron_neutrino].size()) ") ## +/- 12 in vector and +/- 11 in vector
 
 df=df.Define("Mu_Nu","(truth_pdg[Muon_neutrino].size()) ") ## +/- 14 in vector
 
 df=df.Define("Elec_Nu_NC","(truth_pdg[NC_elecNeutrino].size() && truth_pdg[NC_elecNeutrino].size()) ") ## +/- 12 in vector and +/- 11 not in vector
 
 df=df.Define("Signal","Elec_Nu_CC")
 
 df=df.Define("Background","Elec_Nu_NC + Mu_Nu") # add the number of muons to number of NC electrons
 
 df=df.Define("Significance","Signal/sqrt(Background)")

Not sure. As I said, please wait for @vpadulan to confirm if it should work or not

Dear @seley ,

Thanks for reaching out to the forum! Let’s try to clarify a bit the workflow here.

I’m basically trying to write the definition that is effectively the same as Any(
truth_pdg == 12 || truth_pdg == -12) && Any(truth_pdg==11 || truth_pdg==-11)).size(), if that makes sense?

When you write Any(truth_pdg ==12 || truth_pdg == -12) for example, this will return one boolean value (per event), as per the documentation. That means that you cannot call .size() on that since it’s only one boolean. If you want to know exactly how many electrons correspond to your condition, then the correct operation would be Sum over the boolean mask, e.g.

root [0] ROOT::RVec truth_pdg{12, 11, -12, -12, 10, 8, 15, 12, 7};
root [1] truth_pdg == 12 || truth_pdg == -12
(ROOT::VecOps::RVec<int>) { 1, 0, 1, 1, 0, 0, 0, 1, 0 }
root [2] Sum(truth_pdg == 12 || truth_pdg == -12)
(int) 4

Then you could

df.Define("Electron_CC", "Sum(truth_pdg == 12 || truth_pdg == -12) + Sum(truth_pdg==11 || truth_pdg==-11)")

If I infer correctly what your doing from the first post you sent. Let me know if this makes sense and if there’s something missing we can take a look together.

Cheers,
Vincenzo