Dear experts,
I’m using multiple root files for signal and background samples generated by Delphes. I read the files once for each process, create a corresponding TTree object, and store the variables data of interest in branches under the created TTrees. This processes is repeated for each processes, until i get a root file that contains TTrees for each processes with the data i need for machine learning. Maybe the code below will explain better what I’m doing:
# root file to save in TTree object for each processes, each TTree has contains branches that hold variables data of that process
output_file = ROOT.TFile("delphes_events.root", "RECREATE")
# The part of code below is looped over for every processes considered in the analysis.
# hence, the root files conatins multiple TTrees corresponding to each process
## ---------------------##
process_tree_name = "WW"
tree = ROOT.TTree(process_tree_name, "Process Tree")
# prepare the branches
fj1_PT = array('f', [0.0]); tree.Branch("fj1_PT", fj1_PT, "fj1_PT/F")
fj1_Mass = array('f', [0.0]); tree.Branch("fj1_Mass", fj1_Mass, "fj1_Mass/F")
fj2_PT = array('f', [0.0]); tree.Branch("fj2_PT", fj2_PT, "fj2_PT/F")
fj2_Mass = array('f', [0.0]); tree.Branch("fj2_Mass", fj2_Mass, "fj2_Mass/F")
Chain = ROOT.TChain("Delphes")
for file in input_files:
Chain.Add(file)
TreeReader = ROOT.ExRootTreeReader(Chain)
FatJet_branch = TreeReader.UseBranch("FatJet")
TotalEntries = TreeReader.GetEntries()
for entry in track(range(TotalEntries)):
TreeReader.ReadEntry(entry)
NumFatJet = FatJet_branch.GetEntries()
if NumFatJet >= 1:
Cand_FatJet1 = FatJet_branch.At(0)
fj1_PT[0] = Cand_FatJet1.PT
fj1_Mass[0] = Cand_FatJet1.Mass
if NumFatJet >= 2:
Cand_FatJet2 = FatJet_branch.At(1)
fj2_PT[0] = Cand_FatJet2.PT
fj2_Mass[0] = Cand_FatJet2.Mass
tree.Fill()
## ---------------------##
output_file.cd()
tree.Write()
output_file.close()
After getting the root file, i prepare the TMVA as follows:
# root file to save in TTree object for each processes, each TTree has contains branches that hold variables data of that process
input_file = ROOT.TFile.Open("delphes_events.root", "READ")
# add trees i just created directly
dataloader.AddSignalTree(input_file.Get("sig_Tree"))
dataloader.AddBackgroundTree(input_file.Get("WW_Tree"))
dataloader.AddBackgroundTree(input_file.Get("ZZ_Tree"))
dataloader.AddBackgroundTree(input_file.Get("tt_Tree"))
# add branch names of the trees as variables directly
dataloader.AddVariable("fj1_PT", "", "", "F")
dataloader.AddVariable("fj1_Mass", "", "", "F")
dataloader.AddVariable("fj2_PT", "", "", "F")
dataloader.AddVariable("fj2_Mass", "", "", "F")
The problem now is that not every event in every process will contain 2 fatjets or even 1 fatjet, for that case i’m not sure what is filled in the tree branch, and what the TMVA reads? So i think my method is wrong and the way i feed the data into TMVA is also wrong. Can please someone help me identify the problem?
I also noticed that, for example, while creating the trees, if i include a branch called fj1_PT_to_fj2_PT, that basically reads Cand_FatJet1.PT/Cand_FatJet2.PT. Then the histogram of this branch that i see in the root file looks a little bit different from that which i see if i calculate the ratio and plot it by myself using
tree.Draw("fj1_PT/fj2_PT >> hist_ratio", "fj2_PT != 0", "goff")
which again, convinces me that something is wrong in my setup…