Applied cut string does not work as expected

Hello

While doing a simple Events->Scan in a tree, I require the following cutString

cut=(Muon_pt>29 || Electron_pt>37) ||  ( HLT_IsoMu24 || HLT_IsoMu27 || HLT_Ele35_WPTight_Gsf)```

so the print out is like that

root [6] Events->Scan("run:event:luminosityBlock:nMuon:Muon_pt:HLT_IsoMu24:HLT_IsoMu27:nElectron:Electron_pt:HLT_Ele35_WPTight_Gsf","   (Muon_pt>29 || Electron_pt>37) ||  ( HLT_IsoMu24 || HLT_IsoMu27 || HLT_Ele35_WPTight_Gsf) ")
***********************************************************************************************************************************************
*    Row   * Instance *       run *     event * luminosit *     nMuon *   Muon_pt * HLT_IsoMu * HLT_IsoMu * nElectron * Electron_ * HLT_Ele35 *
***********************************************************************************************************************************************
*       27 *        0 *         1 *  20688595 *      8360 *         1 * 30.359632 *         1 *         1 *         1 * 7.6053214 *         0 *
*       48 *        0 *         1 *  20688645 *      8360 *         1 * 25.083187 *         0 *         0 *         1 * 43.771064 *         1 *
*       72 *        0 *         1 *  20688694 *      8360 *         1 * 32.037674 *         1 *         1 *         1 * 5.7447304 *         0 *
*      108 *        0 *         1 *  20688667 *      8360 *         1 * 3.4408552 *         0 *         0 *         1 * 37.886581 *         0 *
*      137 *        0 *         1 *  20688803 *      8360 *         3 * 22.084796 *         0 *         0 *         2 * 39.982471 *         1 *
*      137 *        1 *         1 *  20688803 *      8360 *         3 * 12.733344 *         0 *         0 *         2 * 31.555551 *         1 *

so naively, you should get events when either muon (electrons) have pt> 29 (37) and also when some HLT triggers exist in the event. However, if one does now (just omitting the muon_pt || electron_pt part)

root [7] Events->Scan("run:event:luminosityBlock:nMuon:Muon_pt:HLT_IsoMu24:HLT_IsoMu27:nElectron:Electron_pt:HLT_Ele35_WPTight_Gsf","    ( HLT_IsoMu24 || HLT_IsoMu27 || HLT_Ele35_WPTight_Gsf) ")
***********************************************************************************************************************************************
*    Row   * Instance *       run *     event * luminosit *     nMuon *   Muon_pt * HLT_IsoMu * HLT_IsoMu * nElectron * Electron_ * HLT_Ele35 *
***********************************************************************************************************************************************
*        2 *        0 *         1 *  20688579 *      8360 *         0 *           *         0 *         0 *         1 * 40.771743 *         1 *
*       14 *        0 *         1 *  20688578 *      8360 *         1 * 36.654624 *         1 *         1 *         0 *           *         0 *
*       16 *        0 *         1 *  20688577 *      8360 *         1 * 26.556858 *         1 *         0 *         0 *           *         0 *
*       17 *        0 *         1 *  20688610 *      8360 *         1 * 42.732452 *         1 *         1 *         0 *           *         0 *
*       20 *        0 *         1 *  20688537 *      8360 *         1 * 41.883316 *         1 *         1 *         0 *           *         0 *
*       21 *        0 *         1 *  20688582 *      8360 *         1 * 30.107580 *         1 *         1 *         0 *           *         0 *
*       23 *        0 *         1 *  20688589 *      8360 *         0 *           *         0 *         0 *         1 * 37.945220 *         1 *
*       26 *        0 *         1 *  20688588 *      8360 *         1 * 36.903709 *         1 *         1 *         0 *           *         0 *
*       27 *        0 *         1 *  20688595 *      8360 *         1 * 30.359632 *         1 *         1 *         1 * 7.6053214 *         0 *
*       33 *        0 *         1 *  20688553 *      8360 *         1 * 27.617397 *         1 *         1 *         0 *           *         0 *
*       43 *        0 *         1 *  20688649 *      8360 *         1 * 31.803113 *         1 *         1 *         0 *           *         0 *
*       48 *        0 *         1 *  20688645 *      8360 *         1 * 25.083187 *         0 *         0 *         1 * 43.771064 *         1 *

you can clearly see that the second example returns more lines, but I would have expected that for instance also all the events like from the second example ie Row 2

* 2 * 0 * 1 * 20688579 * 8360 * 0 * * 0 * 0 * 1 * 40.771743 * 1 *

should have been returned with the first syntax as the condition is either muon_pt> OR electron_pt> which is clearly satisfied in Row2

In other words, the
"(Muon_pt>29 || Electron_pt>37) || ( HLT_IsoMu24 || HLT_IsoMu27 || HLT_Ele35_WPTight_Gsf)"

does not satisfy the OR conditions from the first parentheses…unless if I am missing something.

Any ideas?

@couet My guess is that as soon as you use “Muon_pt” ("Electron_pt") in the condition, it automatically requires that at least one “muon” (“electron”) is present, i.e., “nMuon>0” ("nElectron>0").
Try: (nMuon==0 || Muon_pt>29) || (nElectron==0 || Electron_pt>37)

I will need to look closely tomorrow. May be @pcanal can also help.

I ve tried the proposed syntax and it returns the same lines, unfortunately…

Well, just for test purposes … see what you get from the second “Scan” when you use:
(nMuon>0 && nElectron>0) && (...)

That returns only the entries when there is at least a muon and an electron, but as I said, I want the OR of that basically. Even that one

(nMuon>0 || nElectron>0) && (Muon_pt>29 || Electron_pt>37) && ( HLT_IsoMu24 || HLT_IsoMu27 || HLT_Ele35_WPTight_Gsf)

does not work as expected and returns entries when there is always at least one muon and at least an electron.

Just to add that the same happens if I try to actually Events->Draw ie the cut string is not interpreted as expected…

Try:
(nMuon<1 || (nMuon>0 && Muon_pt>29)) || (nElectron<1 || (nElectron>0 && Electron_pt>37))

nop, this does not work either…if that helps, these are multi-dimension arrays, but I think you already know that.

@couet do you have maybe a better idea of what is happening? I find it pretty strange that the cut does not work as expected and I guess this is also worrisome for ROOT developers as well. In any case, please let me know of the proposed solution as this is really a serious showstopper for our analysis efforts, as we really need to properly skim/draw from the trees.

I guess, you need to provide one file for inspection.

You can pick one small file from here /afs/cern.ch/user/a/alkaloge/public/smallFile.root

I think I spotted the problem.
You define “nMuon” and “nElectron” as “UInt_t” but you use them as variable size arrays’ lengths. I think this is not supported in ROOT (@pcanal even in the newest 6.26?). You must use “Int_t” variables for this purpose.
There are more such “UInt_t” variables which need to be “fixed” (e.g., “nboostedTau”, “nCorrT1METJet” and so on).

I am not sure about that… I mean ,as we discussed in the first posts, using only nMuon and/or nElectron seems ok, the problem starts when the _pt are called.

So, I tried instead of

( (Muon_pt>29 || Electron_pt>37 ))

to do something like

 ( (Muon_pt<1 || Muon_pt>29  || Electron_pt>37 || Electron_pt<1)) 

but I still get events when there is at least 1 muon and 1 electron… Does that mean that the same problem that you described is true also for "Float_t" that the _pt arrays are? A

The type of array elements is irrelevant. It’s just the type of variable that keeps its “actual” length that matters.

so, to conclude, there is no any workaround solution for that currently?

UInt_t is only used in this way for all of CMS’s NanoAOD…

Edit:
The CMS nanoAOD-tools preskimming proceeds via TTree::Draw (to acquire an entry list) [1]
TTreeReader is used for the python loop over the data (using that entry list) later on [2, 3]

[1]

[2]

[3]

This correct, even in the current master, the variable used to indicate the size an array must be of type Int_t.

@pcanal @couet This is not the first time I have seen such a problem here.
Would it be possible to find a “brutal fix” for it?
The “sizeof(UInt_t)” = “sizeof(Int_t)” so maybe one could “fool” ROOT to “redefine” broken branches in RAM.
Something like this:

TTree *t; gFile->GetObject("MyTree", t);
t->GetBranch("Problematic_UInt_t_Branch")->Make_ROOT_Think_It_Is_An_Int_t();

I’ve made a small script to check an input file using the Muon_pt > 30 || Electron_pt > 35 criterion (minus any HLT menu items). Feel free to check I haven’t made a critical mistake.

Events are processed:

  1. with RDataFrame
  2. with a python TTree event loop
  3. with a cutstring ((nMuon > 0 && Muon_pt > 30) || (nElectron > 0 && Electron_pt > 35))
  4. with a cutstring ((Muon_pt > 30) || (Electron_pt > 35))

Then repeated, after using RDF to make a snapshot with Int_t counter branches and copies of the pt arrays, which doesn’t seem to affect things…

1 and 2 are consistent with eachother, 3 and 4 with eachother

Output on the original example file:

python test_cutstring.py                                                           CentOS7.9 03-09 13:42:34
TClass::Init:0: RuntimeWarning: no dictionary for class edm::Hash<1> is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ParameterSetBlob is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ProcessHistory is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ProcessConfiguration is available
TClass::Init:0: RuntimeWarning: no dictionary for class __pair_base<edm::Hash<1>,edm::ParameterSetBlob> is available
TClass::Init:0: RuntimeWarning: no dictionary for class pair<edm::Hash<1>,edm::ParameterSetBlob> is available
processing 18790 events
processing events in RDF
making entry-list
processing 18790 events
making entry-list
processing 18790 events
processing events in RDF
making entry-list
processing 18790 events
making entry-list
processing 18790 events
rdataframe counts: 12659 rdataframe with Int_t counts: 12659 python loop counts: 12659
cutstring counts: 2527 cutstring without explicit nMuon/nElectron check: 2527
cutstring with Int_t counts: 2527 cutstring with Int_t without explicit nMuon/nElectron check: 2527

Output on another centrally produced file:

TClass::Init:0: RuntimeWarning: no dictionary for class edm::Hash<1> is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ProcessHistory is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ProcessConfiguration is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ParameterSetBlob is available
TClass::Init:0: RuntimeWarning: no dictionary for class __pair_base<edm::Hash<1>,edm::ParameterSetBlob> is available
TClass::Init:0: RuntimeWarning: no dictionary for class pair<edm::Hash<1>,edm::ParameterSetBlob> is available
processing 48000 events
processing events in RDF
making entry-list
processing 48000 events
making entry-list
processing 48000 events
processing events in RDF
making entry-list
processing 48000 events
making entry-list
processing 48000 events
rdataframe counts: 33680 rdataframe with Int_t counts: 33680 python loop counts: 33680
cutstring counts: 20400 cutstring without explicit nMuon/nElectron check: 20400
cutstring with Int_t counts: 20400 cutstring with Int_t without explicit nMuon/nElectron check: 20400

Script:

#Setup e.g. LCG release on lxplus:
#source /cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc10-opt/setup.sh
import ROOT
import argparse

def get_passing_python(filename, treename="Events"):
    f = ROOT.TFile.Open(filename, "read")
    events = f.Get(treename)
    counter = 0
    print("processing", events.GetEntries(), "events")
    for idx, event in enumerate(events):
        # if (idx//1000) == 0:
        #     print("\tprocessed", idx, "events", idx//1000)
        passing = False
        nMu = event.nMuon
        for nm in range(nMu):
            if event.Muon_pt[nm] > 30: passing = True
        nEl = event.nElectron
        for ne in range(nEl):
            if event.Electron_pt[ne] > 35: passing = True
        if passing: counter+=1
    return counter

def make_Int_t_snapshot(filename, treename="Events", output_filename=None):
    if output_filename is None:
        output_filename = "Int_t_"+filename
    rdf = ROOT.ROOT.RDataFrame(treename, filename)
    rdf_d = rdf.Define("nMuonInt", "(Int_t)nMuon").Define("MuonInt_pt", "Muon_pt").Define("nElectronInt", "(Int_t)nElectron").Define("ElectronInt_pt", "Electron_pt")
    s = rdf_d.Snapshot(treename, output_filename, ["nMuonInt", "MuonInt_pt", "nElectronInt", "ElectronInt_pt", "event"])

def get_passing_rdataframe(filename, treename="Events", mu_name="Muon", el_name="Electron"):
    rdf = ROOT.ROOT.RDataFrame(treename, filename)
    print("processing events in RDF")
    pass_node = rdf.Define("passing", "bool passing = false;"\
                           "for(auto mu_pt: "+mu_name+"_pt){if(mu_pt > 30) passing = true;}"\
                           "for(auto el_pt: "+el_name+"_pt){if(el_pt > 35) passing = true;};"\
                           "return passing;")
    counter = pass_node.Filter("passing").Count()
    return counter.GetValue()

def get_passing_cutstring(filename, treename="Events", mu_name="Muon", el_name="Electron"):
    f = ROOT.TFile.Open(filename, "read")
    events = f.Get(treename)
    print("making entry-list")
    events.Draw(">>elist", "(n"+mu_name+" > 0 && "+mu_name+"_pt > 30) || (n"+el_name+" > 0 && "+el_name+"_pt > 35)")
    events.Draw(">>etrivial", "event >= 0")
    elist = ROOT.gDirectory.Get("elist")
    etrivial = ROOT.gDirectory.Get("etrivial")
    counter = 0
    trivial_counter = 0
    print("processing", events.GetEntries(), "events")
    for idx in range(events.GetEntries()):
        if elist.Contains(idx): counter += 1
        if etrivial.Contains(idx): trivial_counter += 1
    return counter, trivial_counter

def get_passing_cutstring_no_explicit_uint(filename, treename="Events", mu_name="Muon", el_name="Electron"):
    f = ROOT.TFile.Open(filename, "read")
    events = f.Get(treename)
    print("making entry-list")
    events.Draw(">>elist_ne", "("+mu_name+"_pt > 30) || ("+el_name+"_pt > 35)")
    elist = ROOT.gDirectory.Get("elist_ne")
    counter = 0
    print("processing", events.GetEntries(), "events")
    for idx in range(events.GetEntries()):
        if elist.Contains(idx): counter += 1
    return counter

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='testing cutstrings')
    parser.add_argument('--filename', '-f', action='store', type=str, default="/afs/cern.ch/user/a/alkaloge/public/smallFile.root", help="filename of test rootfile")
    args = parser.parse_args()

    filename = args.filename
    # filename = "/afs/cern.ch/user/a/alkaloge/public/smallFile.root"
    filename_local = "Int_t_"+filename.split("/")[-1]

    py = get_passing_python(filename)
    rdf = get_passing_rdataframe(filename)

    cs, trivial = get_passing_cutstring(filename)
    cs_ne = get_passing_cutstring_no_explicit_uint(filename)


    make_Int_t_snapshot(filename, output_filename=filename_local)
    rdf_int = get_passing_rdataframe(filename_local, mu_name="MuonInt", el_name="ElectronInt")
    cs_int, trivial_int = get_passing_cutstring(filename_local, mu_name="MuonInt", el_name="ElectronInt")
    cs_ne_int = get_passing_cutstring_no_explicit_uint(filename_local, mu_name="MuonInt", el_name="ElectronInt")

    print("rdataframe counts:", rdf, "rdataframe with Int_t counts:", rdf_int, "python loop counts:", py, )
    print("cutstring counts:", cs, "cutstring without explicit nMuon/nElectron check:", cs_ne)
    print("cutstring with Int_t counts:", cs_int, "cutstring with Int_t without explicit nMuon/nElectron check:", cs_ne_int)