Feeding TMVA with TTrees for signal and backgrounds

Dear experts,

I’m using multiple root files for signal and background samples generated by Delphes. I read the files once for each process, create a corresponding TTree object, and store the variables data of interest in branches under the created TTrees. This processes is repeated for each processes, until i get a root file that contains TTrees for each processes with the data i need for machine learning. Maybe the code below will explain better what I’m doing:

# root file to save in TTree object for each processes, each TTree has contains branches that hold variables data of that process
output_file = ROOT.TFile("delphes_events.root", "RECREATE")


# The part of code below is looped over for every processes considered in the analysis.
# hence, the root files conatins multiple TTrees corresponding to each process
## ---------------------##
process_tree_name = "WW"
tree =  ROOT.TTree(process_tree_name, "Process Tree")

# prepare the branches
fj1_PT = array('f', [0.0]); tree.Branch("fj1_PT", fj1_PT, "fj1_PT/F")
fj1_Mass = array('f', [0.0]); tree.Branch("fj1_Mass", fj1_Mass, "fj1_Mass/F")
fj2_PT = array('f', [0.0]); tree.Branch("fj2_PT", fj2_PT, "fj2_PT/F")
fj2_Mass = array('f', [0.0]); tree.Branch("fj2_Mass", fj2_Mass, "fj2_Mass/F")

Chain = ROOT.TChain("Delphes")
for file in input_files:
    Chain.Add(file)
TreeReader = ROOT.ExRootTreeReader(Chain)

FatJet_branch   = TreeReader.UseBranch("FatJet")
TotalEntries = TreeReader.GetEntries()
for entry in track(range(TotalEntries)):
	TreeReader.ReadEntry(entry)
    NumFatJet     = FatJet_branch.GetEntries()

    if NumFatJet >= 1:
    	Cand_FatJet1 = FatJet_branch.At(0)
    	fj1_PT[0] = Cand_FatJet1.PT
    	fj1_Mass[0] = Cand_FatJet1.Mass

    if NumFatJet >= 2:
    	Cand_FatJet2 = FatJet_branch.At(1)
    	fj2_PT[0] = Cand_FatJet2.PT
    	fj2_Mass[0] = Cand_FatJet2.Mass

    tree.Fill()

## ---------------------##

output_file.cd()
tree.Write()
output_file.close()

After getting the root file, i prepare the TMVA as follows:

# root file to save in TTree object for each processes, each TTree has contains branches that hold variables data of that process
input_file = ROOT.TFile.Open("delphes_events.root", "READ")

# add trees i just created directly
dataloader.AddSignalTree(input_file.Get("sig_Tree"))
dataloader.AddBackgroundTree(input_file.Get("WW_Tree"))
dataloader.AddBackgroundTree(input_file.Get("ZZ_Tree"))
dataloader.AddBackgroundTree(input_file.Get("tt_Tree"))

# add branch names of the trees as variables directly
dataloader.AddVariable("fj1_PT", "", "", "F")
dataloader.AddVariable("fj1_Mass", "", "", "F")
dataloader.AddVariable("fj2_PT", "", "", "F")
dataloader.AddVariable("fj2_Mass", "", "", "F")

The problem now is that not every event in every process will contain 2 fatjets or even 1 fatjet, for that case i’m not sure what is filled in the tree branch, and what the TMVA reads? So i think my method is wrong and the way i feed the data into TMVA is also wrong. Can please someone help me identify the problem?

I also noticed that, for example, while creating the trees, if i include a branch called fj1_PT_to_fj2_PT, that basically reads Cand_FatJet1.PT/Cand_FatJet2.PT. Then the histogram of this branch that i see in the root file looks a little bit different from that which i see if i calculate the ratio and plot it by myself using

tree.Draw("fj1_PT/fj2_PT >> hist_ratio", "fj2_PT != 0", "goff")

which again, convinces me that something is wrong in my setup…

Hi @ammelsayed,

thank you for explaining to us your use case and encountered issue. I will add @moneta in the loop so he can give you some further tips.

Cheers,

Marta

1 Like