Applied cut string does not work as expected

Try:
(nMuon<1 || (nMuon>0 && Muon_pt>29)) || (nElectron<1 || (nElectron>0 && Electron_pt>37))

nop, this does not work either…if that helps, these are multi-dimension arrays, but I think you already know that.

@couet do you have maybe a better idea of what is happening? I find it pretty strange that the cut does not work as expected and I guess this is also worrisome for ROOT developers as well. In any case, please let me know of the proposed solution as this is really a serious showstopper for our analysis efforts, as we really need to properly skim/draw from the trees.

I guess, you need to provide one file for inspection.

You can pick one small file from here /afs/cern.ch/user/a/alkaloge/public/smallFile.root

I think I spotted the problem.
You define “nMuon” and “nElectron” as “UInt_t” but you use them as variable size arrays’ lengths. I think this is not supported in ROOT (@pcanal even in the newest 6.26?). You must use “Int_t” variables for this purpose.
There are more such “UInt_t” variables which need to be “fixed” (e.g., “nboostedTau”, “nCorrT1METJet” and so on).

I am not sure about that… I mean ,as we discussed in the first posts, using only nMuon and/or nElectron seems ok, the problem starts when the _pt are called.

So, I tried instead of

( (Muon_pt>29 || Electron_pt>37 ))

to do something like

 ( (Muon_pt<1 || Muon_pt>29  || Electron_pt>37 || Electron_pt<1)) 

but I still get events when there is at least 1 muon and 1 electron… Does that mean that the same problem that you described is true also for "Float_t" that the _pt arrays are? A

The type of array elements is irrelevant. It’s just the type of variable that keeps its “actual” length that matters.

so, to conclude, there is no any workaround solution for that currently?

UInt_t is only used in this way for all of CMS’s NanoAOD…

Edit:
The CMS nanoAOD-tools preskimming proceeds via TTree::Draw (to acquire an entry list) [1]
TTreeReader is used for the python loop over the data (using that entry list) later on [2, 3]

[1]

[2]

[3]

This correct, even in the current master, the variable used to indicate the size an array must be of type Int_t.

@pcanal @couet This is not the first time I have seen such a problem here.
Would it be possible to find a “brutal fix” for it?
The “sizeof(UInt_t)” = “sizeof(Int_t)” so maybe one could “fool” ROOT to “redefine” broken branches in RAM.
Something like this:

TTree *t; gFile->GetObject("MyTree", t);
t->GetBranch("Problematic_UInt_t_Branch")->Make_ROOT_Think_It_Is_An_Int_t();

I’ve made a small script to check an input file using the Muon_pt > 30 || Electron_pt > 35 criterion (minus any HLT menu items). Feel free to check I haven’t made a critical mistake.

Events are processed:

  1. with RDataFrame
  2. with a python TTree event loop
  3. with a cutstring ((nMuon > 0 && Muon_pt > 30) || (nElectron > 0 && Electron_pt > 35))
  4. with a cutstring ((Muon_pt > 30) || (Electron_pt > 35))

Then repeated, after using RDF to make a snapshot with Int_t counter branches and copies of the pt arrays, which doesn’t seem to affect things…

1 and 2 are consistent with eachother, 3 and 4 with eachother

Output on the original example file:

python test_cutstring.py                                                           CentOS7.9 03-09 13:42:34
TClass::Init:0: RuntimeWarning: no dictionary for class edm::Hash<1> is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ParameterSetBlob is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ProcessHistory is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ProcessConfiguration is available
TClass::Init:0: RuntimeWarning: no dictionary for class __pair_base<edm::Hash<1>,edm::ParameterSetBlob> is available
TClass::Init:0: RuntimeWarning: no dictionary for class pair<edm::Hash<1>,edm::ParameterSetBlob> is available
processing 18790 events
processing events in RDF
making entry-list
processing 18790 events
making entry-list
processing 18790 events
processing events in RDF
making entry-list
processing 18790 events
making entry-list
processing 18790 events
rdataframe counts: 12659 rdataframe with Int_t counts: 12659 python loop counts: 12659
cutstring counts: 2527 cutstring without explicit nMuon/nElectron check: 2527
cutstring with Int_t counts: 2527 cutstring with Int_t without explicit nMuon/nElectron check: 2527

Output on another centrally produced file:

TClass::Init:0: RuntimeWarning: no dictionary for class edm::Hash<1> is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ProcessHistory is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ProcessConfiguration is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ParameterSetBlob is available
TClass::Init:0: RuntimeWarning: no dictionary for class __pair_base<edm::Hash<1>,edm::ParameterSetBlob> is available
TClass::Init:0: RuntimeWarning: no dictionary for class pair<edm::Hash<1>,edm::ParameterSetBlob> is available
processing 48000 events
processing events in RDF
making entry-list
processing 48000 events
making entry-list
processing 48000 events
processing events in RDF
making entry-list
processing 48000 events
making entry-list
processing 48000 events
rdataframe counts: 33680 rdataframe with Int_t counts: 33680 python loop counts: 33680
cutstring counts: 20400 cutstring without explicit nMuon/nElectron check: 20400
cutstring with Int_t counts: 20400 cutstring with Int_t without explicit nMuon/nElectron check: 20400

Script:

#Setup e.g. LCG release on lxplus:
#source /cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc10-opt/setup.sh
import ROOT
import argparse

def get_passing_python(filename, treename="Events"):
    f = ROOT.TFile.Open(filename, "read")
    events = f.Get(treename)
    counter = 0
    print("processing", events.GetEntries(), "events")
    for idx, event in enumerate(events):
        # if (idx//1000) == 0:
        #     print("\tprocessed", idx, "events", idx//1000)
        passing = False
        nMu = event.nMuon
        for nm in range(nMu):
            if event.Muon_pt[nm] > 30: passing = True
        nEl = event.nElectron
        for ne in range(nEl):
            if event.Electron_pt[ne] > 35: passing = True
        if passing: counter+=1
    return counter

def make_Int_t_snapshot(filename, treename="Events", output_filename=None):
    if output_filename is None:
        output_filename = "Int_t_"+filename
    rdf = ROOT.ROOT.RDataFrame(treename, filename)
    rdf_d = rdf.Define("nMuonInt", "(Int_t)nMuon").Define("MuonInt_pt", "Muon_pt").Define("nElectronInt", "(Int_t)nElectron").Define("ElectronInt_pt", "Electron_pt")
    s = rdf_d.Snapshot(treename, output_filename, ["nMuonInt", "MuonInt_pt", "nElectronInt", "ElectronInt_pt", "event"])

def get_passing_rdataframe(filename, treename="Events", mu_name="Muon", el_name="Electron"):
    rdf = ROOT.ROOT.RDataFrame(treename, filename)
    print("processing events in RDF")
    pass_node = rdf.Define("passing", "bool passing = false;"\
                           "for(auto mu_pt: "+mu_name+"_pt){if(mu_pt > 30) passing = true;}"\
                           "for(auto el_pt: "+el_name+"_pt){if(el_pt > 35) passing = true;};"\
                           "return passing;")
    counter = pass_node.Filter("passing").Count()
    return counter.GetValue()

def get_passing_cutstring(filename, treename="Events", mu_name="Muon", el_name="Electron"):
    f = ROOT.TFile.Open(filename, "read")
    events = f.Get(treename)
    print("making entry-list")
    events.Draw(">>elist", "(n"+mu_name+" > 0 && "+mu_name+"_pt > 30) || (n"+el_name+" > 0 && "+el_name+"_pt > 35)")
    events.Draw(">>etrivial", "event >= 0")
    elist = ROOT.gDirectory.Get("elist")
    etrivial = ROOT.gDirectory.Get("etrivial")
    counter = 0
    trivial_counter = 0
    print("processing", events.GetEntries(), "events")
    for idx in range(events.GetEntries()):
        if elist.Contains(idx): counter += 1
        if etrivial.Contains(idx): trivial_counter += 1
    return counter, trivial_counter

def get_passing_cutstring_no_explicit_uint(filename, treename="Events", mu_name="Muon", el_name="Electron"):
    f = ROOT.TFile.Open(filename, "read")
    events = f.Get(treename)
    print("making entry-list")
    events.Draw(">>elist_ne", "("+mu_name+"_pt > 30) || ("+el_name+"_pt > 35)")
    elist = ROOT.gDirectory.Get("elist_ne")
    counter = 0
    print("processing", events.GetEntries(), "events")
    for idx in range(events.GetEntries()):
        if elist.Contains(idx): counter += 1
    return counter

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='testing cutstrings')
    parser.add_argument('--filename', '-f', action='store', type=str, default="/afs/cern.ch/user/a/alkaloge/public/smallFile.root", help="filename of test rootfile")
    args = parser.parse_args()

    filename = args.filename
    # filename = "/afs/cern.ch/user/a/alkaloge/public/smallFile.root"
    filename_local = "Int_t_"+filename.split("/")[-1]

    py = get_passing_python(filename)
    rdf = get_passing_rdataframe(filename)

    cs, trivial = get_passing_cutstring(filename)
    cs_ne = get_passing_cutstring_no_explicit_uint(filename)


    make_Int_t_snapshot(filename, output_filename=filename_local)
    rdf_int = get_passing_rdataframe(filename_local, mu_name="MuonInt", el_name="ElectronInt")
    cs_int, trivial_int = get_passing_cutstring(filename_local, mu_name="MuonInt", el_name="ElectronInt")
    cs_ne_int = get_passing_cutstring_no_explicit_uint(filename_local, mu_name="MuonInt", el_name="ElectronInt")

    print("rdataframe counts:", rdf, "rdataframe with Int_t counts:", rdf_int, "python loop counts:", py, )
    print("cutstring counts:", cs, "cutstring without explicit nMuon/nElectron check:", cs_ne)
    print("cutstring with Int_t counts:", cs_int, "cutstring with Int_t without explicit nMuon/nElectron check:", cs_ne_int)

Take the “smallFile.root” and try:

Events->Scan("nElectron:Electron_pt", "", "", 10);
Events->Scan("nElectron:Electron_pt", "(nElectron<1) || (nElectron>0)", "", 10);
Events->Scan("nElectron:Electron_pt", "(nElectron<1) || (nElectron>0 && Electron_pt>0)", "", 10);
Events->Scan("nElectron:Electron_pt", "(nElectron<1) || (nElectron>0 && Alt$(Electron_pt,0)>0)", "", 10);

@couet Well, there may be two problems. One related to the usage of “Electron_pt” in the cut (without “Alt$”, it unconditionally requires that at least one electron is present, so it disregards the “nElectron<1” condition) and another one related to the “UInt_t” type of the “nElectron”.

@Alkass I do not know if the result will always be correct, but instead of:
"(Muon_pt>29 || Electron_pt>37)"
try to use:
"(Alt$(Muon_pt,0)>29 || Alt$(Electron_pt,0)>37)"

This is really under the expertise of @pcanal

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

This is, for better or worse, the TTreeFormula semantic. When writing:

(Muon_pt>29 || Electron_pt>37)

TTreeFormula expands it to:

for(Int_t index = 0; index < min( num_of_elements(Muon_pt), num_of_elements(Electron_pt))
    use `(Muon_pt[index]>29 || Electron_pt[index]>37)

The Alt$ function was introduced to work around this limitation.

So implicitly,

(Muon_pt>29 || Electron_pt>37)

is equivalent to

(Muon_pt>29 || Electron_pt>37) && (nMuon > 0 && nElectron > 0 && Instance$ < nMuon && Instance$ < nElectron)
cut=(Muon_pt>29 || Electron_pt>37) ||  ( HLT_IsoMu24 || HLT_IsoMu27 || HLT_Ele35_WPTight_Gsf)

so naively, you should get events when either muon (electrons) have pt> 29 (37) and also when some HLT triggers exist in the event.

Actually this is more complex. Technically the cut is applied “per element” rather than “per event”, so it
used as:

(Muon_pt[index]>29 || Electron_pt[index]>37) ||  ( HLT_IsoMu24 || HLT_IsoMu27 || HLT_Ele35_WPTight_Gsf)

to get the “per event” filtering you can use:

(  Sum$(Muon_pt>29) || Sum$(Electron_pt>37)) ||  ( HLT_IsoMu24 || HLT_IsoMu27 || HLT_Ele35_W
PTight_Gsf)

(assuming the last 4 are not arrays, if they are array they also need the Sum$)