Use RDataFrame to skim a tree

Hi,
I am trying to use RDataFrame class to skim a tree. I also want to select only the first 30 events from the input tree. This is the code snippet I wrote for python:

fileOut     =  "skimTree.root"
treeOut   =  "outputTree"
treeIn      = "ttbar_NoSys"
treeInFile = "ttbar_merged_processed.root"
RDF = ROOT.ROOT.RDataFrame
d   = RDF(treeIn, treeInFile)
dSmall = d.Filter("trigMatch_electronPlusMuonTrig!=0")
dCut   = dSmall.Filter("nLep_signal >= 2")
dOut =  dCut.Range(30);
dOut =  dCut.Define("met", "met_Et") \
            .Define("metPhi", "met_Phi") \
            .Define("ht", "Ht30") \
            .Define("Mll", "mll") \
            .Define("dPhi1", "DPhiJ1Met") \
            .Define("dPhi2", "DPhiJ2Met") \
            .Define("zPt", "Ptll") \
            .Define("mt2w", "mt2leplsp_0") \
            .Define("jetN", "nJet30") \
            .Define("bjetN", "nBJet30_MV2c10_FixedCutBEff_77") \
            .Define("lepN", "nLep_signal") \
            .Define("leadLepPt","lepPt[0]") \
            .Define("trailLepPt", "lepPt[1]") \
            .Define("leadLepEta", "lepEta[0]") \
            .Define("trailLepEta", "lepEta[1]") \
            .Define("leadLepPhi", "lepPhi[0]") \
            .Define("trailLepPhi", "lepPhi[1]") \
            .Define("runNumber", "RunNumber") \
            .Define("eventNumber", "EventNumber") \
            .Define("eeTrig", "trigWeight_1L2LTrig") \
            .Define("eeTrigMatch", "trigMatch_1L2LTrig") \
            .Define("mmTrig", "trigWeight_diMuonTrig") \
            .Define("mmTrigMatch", "trigMatch_diMuonTrig") \
            .Define("emTrig", "trigWeight_electronPlusMuonTrig")\
            .Define("emTrigMatch", "trigMatch_diMuonTrig") \
            .Define("isOS", "int OS = -1; (lepCharge[0]!=lepCharge[1] ? OS=1:OS=0); return OS;")
        
        
dOut.Snapshot(treeOut, fileOut)

But the code is running on the total number of events of the input tree. How can I only run on the first 30 events? Can anyone point me my mistake?
Thank you,
Arka


_ROOT Version: 6.14.04
_Platform: sl6 on lxplus
Compiler: Not Provided


Hi,
it should be

dOut =  dCut.Range(30);
dOut =  dOut.Define("met", "met_Et")

where the second line should apply all the defines right after the Range, i.e. on dOut, not dCut.

Cheers,
Enrico

1 Like

Hi Enrico,
That helped. Thanks a lot. I have two more question regarding RDataFrame:

  1. I see that the output tree contained all the branches of the input tree, in addition to what I defined in the above code. How can I create an output tree which will ONLY have the branches I defined in the code?
  2. I want to add another variable to the tree named ‘channel’. This is how I want to define it:
   if(lepFlavor[0] == 1 and lepFlavor[1] == 1):
       channel    = 1  ### ee
   elif(lepFlavor[0] == 2 and lepFlavor[1] == 2):
       channel    = 0  ### mm
   elif(lepFlavor[0] == 1 and lepFlavor[1] == 2):
       channel    = 2  ### em
   elif(lepFlavor[0] == 2 and lepFlavor[1] == 1):
       channel    = 3  ### me

How can I add this variable ‘channel’ to my tree ‘dOut’ using RDataFrame?
Thanks,
Arka

Hi Arka,

  1. I see that the output tree contained all the branches of the input tree, in addition to what I defined in the above code. How can I create an output tree which will ONLY have the branches I defined in the code?

In order to achieve this, you can either use a regular expression to match your branches or provide them as a vector. To take your code as an example:

in master (and forthcoming ROOT 6.16/00 version)

[...]
dOut.Snapshot(treeOut, fileOut, dOut.GetDefinedColumnNames())

in 6.14

[...]
colNames = ROOT.std.vector("string")()
colNames.push_back("met")
# ...
colNames.push_back("trailLepPt")
dOut.Snapshot(treeOut, fileOut, colNames)
  1. I want to add another variable to the tree named ‘channel’. This is how I want to define it: …

Inside Defines in Python you can use C++. Therefore, that condition shall be coded as:

quick option: we exploit the fact that true is cast to 1 and false to 0

myFlavourDefine = '''
(lepFlavor[0] == 1 and lepFlavor[1] == 1) * 0 + (lepFlavor[0] == 2 and lepFlavor[1] == 2) * 1 + (lepFlavor[0] == 1 and lepFlavor[1] == 2) * 2 + (lepFlavor[0] == 2 and lepFlavor[1] == 1) *3
'''
[...]
dOut =  dOut.Define("channel", myFlavourDefine)

more elaborate:

myFunc = '''
int myFunc(const ROOT::VecOps::RVec<int> &lepFlavor)
{
if (lepFlavor[0] == 1 and lepFlavor[1] == 1) return  0;
else if (lepFlavor[0] == 2 and lepFlavor[1] == 2) return 1;
else if (lepFlavor[0] == 1 and lepFlavor[1] == 2) return 2;
else if (lepFlavor[0] == 2 and lepFlavor[1] == 1) return 3;
return -1; //fallback?
}
'''
ROOT.gInterpreter.Declare(myFunc)
dOut =  dOut.Define("channel", "myFunc(lepFlavor)")

Cheers,
D

2 Likes

Hi Danilo,
Thank you very much for your help. Now I get the skimmed tree with fewer branches, which I wanted.

I am trying to rewrite my code so that I migrate from the conventional TTree class to RDataFrame class. The conventional code is here.

This code tries to generate an event based weight and save that to the output tree. This is the snippet (Line 366 to Line 429):
(find_weight and find_trigger_weight are defined here)

# Retrieve the FS weights
    pt0            = event.lepPt[0]
    pt1            = event.lepPt[1]
    eta0           = event.lepEta[0]
    eta1           = event.lepEta[1]
    HT             = event.Ht30
    MET            = event.met_Et

    # Find the correct k and alpha factors
    if ((era=='data15-16') or (era=='MC16a')):
        weights_lep0    = weights_lep0_2015
        weights_lep1    = weights_lep1_2015
        trigger_weights = trigger_weights_2015
    elif ((era=='data17') or (era=='MC16cd')):
        weights_lep0    = weights_lep0_2016
        weights_lep1    = weights_lep1_2016
        trigger_weights = trigger_weights_2016
    
    ### getting the weights
    if channel > 1:
        if not event.trigMatch_electronPlusMuonTrig: continue
        nSaveEvents    += 1

        ee_weight      = .5
        mm_weight      = .5
        ee_weight_up   = .5
        mm_weight_up   = .5
        ee_weight_down = .5
        mm_weight_down = .5

        if channel == 2:
            ee_weight      *= find_weight(pt1, fabs(eta1), weights_lep1, do_err_up, do_err_down)
            mm_weight      /= find_weight(pt0, fabs(eta0), weights_lep0, do_err_up, do_err_down)
            ee_weight_up   *= find_weight(pt1, fabs(eta1), weights_lep1, True, do_err_down)
            mm_weight_up   /= find_weight(pt0, fabs(eta0), weights_lep0, True, do_err_down)
            ee_weight_down *= find_weight(pt1, fabs(eta1), weights_lep1, do_err_up, True)
            mm_weight_down /= find_weight(pt0, fabs(eta0), weights_lep0, do_err_up, True)

        if channel == 3:
            ee_weight      *= find_weight(pt0, fabs(eta0), weights_lep0, do_err_up, do_err_down)
            mm_weight      /= find_weight(pt1, fabs(eta1), weights_lep1, do_err_up, do_err_down)
            ee_weight_up   *= find_weight(pt0, fabs(eta0), weights_lep0, True, do_err_down)
            mm_weight_up   /= find_weight(pt1, fabs(eta1), weights_lep1, True, do_err_down)
            ee_weight_down *= find_weight(pt0, fabs(eta0), weights_lep0, do_err_up, True)
            mm_weight_down /= find_weight(pt1, fabs(eta1), weights_lep1, do_err_up, True)
        
        fsKLep0[0]  =  find_weight(pt0, fabs(eta0), weights_lep0, do_err_up, do_err_down)
        fsKLep1[0]  =  find_weight(pt1, fabs(eta1), weights_lep1, do_err_up, do_err_down)

        if find_trigger_weight(channel,pt0,fabs(eta0),trigger_weights, pt1) == 0:
            trigger_weight = 1
            fsAlpha[0]     = 1
        else: 
            trigger_weight      = sqrt( find_trigger_weight(0,pt0,fabs(eta0),trigger_weights) * find_trigger_weight(1,pt0,fabs(eta0),trigger_weights) ) / find_trigger_weight(channel,pt0,fabs(eta0),trigger_weights,pt1)
            trigger_weight_up   = sqrt( find_trigger_weight(0,pt0,fabs(eta0),trigger_weights, up_or_down=1) * find_trigger_weight(1,pt0,fabs(eta0),trigger_weights, up_or_down=1) ) / find_trigger_weight(channel,pt0,fabs(eta0),trigger_weights,pt1, up_or_down=-1)
            trigger_weight_down = sqrt( find_trigger_weight(0,pt0,fabs(eta0),trigger_weights, up_or_down=-1) * find_trigger_weight(1,pt0,fabs(eta0),trigger_weights, up_or_down=-1) ) / find_trigger_weight(channel,pt0,fabs(eta0),trigger_weights,pt1, up_or_down=1)
            fsAlpha[0]          = sqrt( find_trigger_weight(0,pt0,fabs(eta0),trigger_weights) * find_trigger_weight(1,pt0,fabs(eta0),trigger_weights) ) / find_trigger_weight(channel,pt0,fabs(eta0),trigger_weights,pt1)

        ee_weight      *= trigger_weight
        mm_weight      *= trigger_weight
        ee_weight_up   *= trigger_weight_up
        mm_weight_up   *= trigger_weight_up
        ee_weight_down *= trigger_weight_down
        mm_weight_down *= trigger_weight_down

I want to save these branches to the output tree using RDataFrame:

ee_weight
mm_weight
ee_weight_up
mm_weight_up
ee_weight_down
mm_weight_down

How can I do that?
Thanks,
Arka

1 Like

Hi Arka,

I think you can convert your code to C++ and proceed as we saw in the previous post of mine.

Cheers,
D

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.