Switch the testing and training events

amytee · April 18, 2019, 1:16pm

Hi,

I am using a BDT and currently half the events in my signal and background samples are used for the training and half for the evalating. As such, my results only contain the evaluated results, which is half of my events.

As such, I want to run the BDT twice, first train one half of the events and evaluate on the other, then run it again, this time switching the events which were trained and evaluated on. I can then combine the evaluated results.

I am not sure how best to do this. I figure I need to adapt the following line somehow:

      dataloader->PrepareTrainingAndTestTree("","NormMode=EqualNumEvents:SplitMode=Random:!V");   //For MVA Tree input mode`

Einsiedler · April 18, 2019, 1:48pm

Hi Amytee,

what you can do is specifying two different input files for your training and testing samples. This is something I asked about few days ago, you can check the thread here to understand how to do so: How to specify different input files for training and testing events?

If you need a bit of help to implement it, I can also show you how I made it work following the reply in this thread

amytee · April 18, 2019, 1:51pm

Hi,

So I spoke to someone about how to solve the problem, and they suggested that I do the following:

if the
EventNumber%2 == 0 (testing)
EventNumber%2 == 1 (Training)

Then all you have to do is switch this requirement and repeat.

However, I am not sure how best to implement this in the code. It seems the most straightfoward method though.

Einsiedler · April 18, 2019, 1:56pm

I am not quite sure how you would extract the events numbers and then assign them to the training or testing samples one by one. It would require some extra lines of code I don’t know of. Specifying different input files for training and testing and then switch them for a second run seems more straightforward to me honestly. Let me know if you choose to try that option and need help

kialbert · April 23, 2019, 10:03am

Hi,

You can do this by:

// for older root versions:
// auto sigtree = static_cast<TTree *>(input->Get("name_of_sigtree"));
// auto bkgtree = static_cast<TTree *>(input->Get("name_of_bkgtree"));
// In 6.18 and onwards:
auto sigtree = input->Get<TTree>("name_of_sigtree");
auto bkgtree = input->Get<TTree>("name_of_bkgtree");

TMVA::DataLoader *dataloader=new TMVA::DataLoader("dataset");
dataloader->AddVariable("x");
dataloader->AddVariable("y");
// ...

// Add a selection of events to training and testing sets
// NOTE: I did not try this, so it might be the case that you need to put
//       `(int(EventNumber)%2)==0` instead.
dataloader->AddTree(sigtree, "Signal", 1.0, "(EventNumber%2)==0", TMVA::Types::kTraining);
dataloader->AddTree(sigtree, "Signal", 1.0, "(EventNumber%2)==1", TMVA::Types::kTesting);

// Do the same for background
dataloader->AddTree(bkgtree, "Background", ...);
dataloader->AddTree(bkgtree, "Background", ...);

dataloader->PrepareTrainingAndTestTree("", "", "<options for prepare>");

This will give you correct results only if your training is very robust. The proper way would be to evaluate the trained BDT using the TMVA::Reader. E.g. by looping through the events saved in the output file (by default TMVA.root) under dataset/TrainTree.

Cheers,
Kim

amytee · April 23, 2019, 10:10am

Thank you so much for this. I was actually trying to implement something along these lines today, as I came across this:

SourgeForgeTMVA

However, your answer is very clear! So thank you :D. I now need to add the event numbers to my tree…

kialbert · April 25, 2019, 5:08pm

Just as a further addendum: If you wish to have your BDT evaluated using all data as test data, then consider using cross validation. This is precisely the use case it was designed for.

If you really care about the the results on the training data that is another matter of course and you should proceed as you have done.

Cheers,
Kim