How to use a particular amount of events for BDT training in TVMA?

Hello,

I have over 64 million events but I don’t want to train them all. I plan to use only 5 million events only for training. I’ve looked into the TMVA user guide but I’m not exactly sure what to use. Any help is appreciated. Thanks!

1 Like

Hi,

You can set nTrain_Signal=5000000 and nTest_Signal=0. Then this will take 5 million for traning and rest for test.

FYI, I am copying some lines from TMVAUserManual from page number 20 below:

For classification, the numbers of signal and background events used for training and testing are specified in the configuration string by the variables nTrain Signal, nTrain Background, nTest Signal and nTest Background (for example, “nTrain Signal=5000:nTrain Background=5000: nTest Signal=4000:nTest Background=5000”). The default value (zero) signifies that all available events are taken, e.g., if nTrain Signal=5000 and nTest Signal=0, and if the total signal sample has 15000 events, then 5000 signal events are used for training and the remaining 10000 events are used for testing. If nTrain Signal=0 and nTest Signal=0, the signal sample is split in half for training and testing. The same rules apply to background. Since zero is default, not specifying anything corresponds to splitting the samples in two halves.

with regards,
Ram

1 Like

Ramkrishna’s answer is indeed the way to do this in TMVA :slight_smile:

You can also consider to pre-process your trees to filter out the events you want to train on; The event selection process can incur a significant overhead if you repeat your trainings many times.

Cheers,
Kim

Hi all, thanks for your suggestions. As for my case, I have decided to just create a minitree of my dataset. :slight_smile:

1 Like