Creating new variable with RDataFrame and saving to root file


I am using an RDataFrame to access a root file, create a new variable and then save that new variable to a TTree, along with the original variables in the TTree.

I have managed to do this, however, upon opening the root file there are two TTrees. One contains the correct number of events and the other has more than the original number.

EDIT: Of the two TTrees one shows the correct number of entries, and the other less. This is in line with the solution.

Below is the code modified to keep pertinent information only:

mcGenerator = ["Nominal"]  
DSID = ["410470"]  
nJets = ["2jet" ]
flavours = ["bb"] 

rootDir = './1lep/'

files = []
for gen, dsid in zip(mcGenerator, DSID):
    for jet in nJets:
	    for flav in flavours:
		    fileDir =  gen + "_" + dsid + "_" + jet + "_" + flav

for filePath in glob.glob(rootDir):
    for sample in files:
	        fileName = glob.glob(filePath + sample + "/*.root")
	        file =	
            tree = file[str(file.keys(0))[2:-2]]
            fileName = str(newString)
            treeName = str(tree)
    	    treeBranches = list(tree.keys()) 
	        d = ROOT.RDataFrame(treeName, fileName)	 		
	        d2 = d.Define("EventWeightpTVWeighted", "EventWeight * ((pTV * pTV)/(75*75))")
	        branchList = ROOT.vector('string')()
	        for branchName in ["EventNumber", "EventNumberModNfold", "EventWeight", "FlavourLabel", "MET", "dPhiLBmin", "dPhiVBB", "dRBB", "mBB", "mBBJ", "nJ", "pTB1", "pTB2", "pTV", "pTJ3", "mTop", "dYWH", "mTW", "FoldType", "EventWeightpTVWeighted"]:

	        d2.Snapshot(treeName, sample + "_mod.root", branchList)

Hi @amytee,

I couldn’t tell what could be going wrong there. However, maybe @eguiraud has any suspicions.


that code snippet looks a bit weird because each Snapshot call in the inner for loop seems to re-create the same file sample + "_mod.root".

Also Snapshot cannot create more entries than there are in the original tree, so there is something to be better understood there probably. What does rootls -t <outputfile>.root print?


the lines that build treeBranches seem to not serve any purpose?


I’ve been playing around a little with the code and the treeBranches line was being used in a part of the code I removed (and clearly forgot to remove this line too)

This is results:

TFile**		Nominal_410470_2jet_bb_mod.root	
TFile*		Nominal_410470_2jet_bb_mod.root	
KEY: TTree	MVA_Var_Tree_Sig;3	MVA_Var_Tree_Sig
KEY: TTree	MVA_Var_Tree_Sig;2	MVA_Var_Tree_Sig

root [2]

Ok those are “cycles” of the same object: basically when you write out big TTrees ROOT makes regular “snapshots” during writing to make sure that not all data is lost in case of a crash. The only cycle you should care about is the last/highest (3 in that case), which is also what is returned when you just call file.Get("MVA_Var_Tree_Sig").

So only considering the highest cycle, is everything ok? (apart from the fact that you have many Snapshots that write the same file)

yes thank you :smiley:

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.