Using specific numbers of samples from each input tree ROOT::TMVA

Hi all, I have a few trees I am using to train my models.

Instead of giving everything to the dataloader I would like to specify (eg) 10000 events from this tree, 1000 events from this tree I would like to be my signal sample.

Is this possible at all with the current tools? I could of course create new root files that pulls 10000 events from one tree, and 1000 from the other. However I want to sweep the % of samples coming from this new file to see the effect of introducing this sample. Creating a new root file each time is not ideal.

Cheers


Please read tips for efficient and successful posting and posting code

Please fill also the fields below. Note that root -b -q will tell you this info, and starting from 6.28/06 upwards, you can call .forum bug from the ROOT prompt to pre-populate a topic.

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided


Hi,

I am adding the expert in the loop @moneta .

Cheers,
D

Hi,

This should be possible by adding each single Tree to the data loader and specifying the corresponding selection as a TCut. The event selection for each tree will then be used internally in TMVA, by using the TTree::CopyTree function to copy the relevant event from the TTree.

Lorenzo

Hi Moneta,

Thanks for the response. But just to clarify do you mean in the dataloader->PrepareTrainingAndTestTree step I can add a cut? How do I interact with the individual trees (I am already adding each of them to the data loader individually) within this cut selecting events? As I understood I could only interact with variables in this cut selection i.e. mycut = “var1>0”.

I’ve just realised I could do:

TTree *newTree1 = tree1->CloneTree(10000);
TTree *newTree2 = tree2->CloneTree(1000);

and then

dataloader->AddSignalTree(newTree1, 1.0);
dataloader->AddSignalTree(newTree2, 1.0);

to achieve the desired effect right?

If there is a way to do this with the TCut please let me know because I am unaware of how that would work… However for now this should be fine (untested)