TMVA with different signal variables from different files

Dear Sir,

I have several variables for TMVA study.
I added NLO effect to some of the variables, I wish to train the TMVA with both LO sample variables and NLO sample variables.
i.e.
variable 1 and 2 come from LO sample,
variable 3 and 4 come from NLO sample.

The LO and NLO samples have different number of events.
I used AddFriend as :
TTree tree1 = (TTree)inputsig->Get(“T”);
tree1->AddFriend(“T2”,“sample2.root”);

I checked same variable distribution from LO and NLO is similar.
But train with all LO variables and train with part LO and part NLO variables, the results is very different.

I’m thinking maybe the events ordering cause the problem.

May I ask how to solve this?

Thanks.

Best,
Jung

Hi!

You could check the output file histograms for the variables in question. Do they look like you expect? This shows you what TMVA sees :slight_smile:

Indeed as you say I think the ordering of events might be the culprit here.
How do you know how the events in the two files correspond to each other? If you add 4 variables TMVA assumes that for the first entry of the tree all variables correspond to the first event. How do you guarantee this when you have different number of events?

Would it make sense to run the training with only NLO variables?

Cheers,
Kim

Dear Kim,

Thanks a lot for the reply. I checked the output histograms, which is as I expected. So I think the main problem comes from the ordering of data. Due to some technical problems, I can only used part of distributions in NLO and part of it in LO.

May I ask, is it possible to train BDT based on another trained classifier?

i.e. I first trained with LO variables, then I train again with NLO variables based on the cuts obtained in previous LO trainning?

Thanks!

Best,

Justine

Hi,

First I would ask for a clarification: When using both LO and NLO data, do you input 4 variables to TMVA or 2?

In the former case, if you want to use both sets of input variables at the same time, select only those events for which you have data from both. This should be straightforward enough if you have e.g. an event number associated with each event.

In the latter case, I don’t think there should be a problem in principle. But as always things depends on your particular data set up.

Sounds like you would want to do a partial training with dataset 1 and then continue the training with dataset 2. This is not possible with TMVA unfortunately.

Cheers,
Kim

Dear Kim,

Thanks a lot for the reply! I got it.

Best,

Jung

Dear Kim,

I have another question. Is it possible to turn off the data correlation in TMVA, only use data-pattern (i.e. output histograms) to do trainning?

Thanks!

Best,

Jung

Hi,

For separate questions, please create a new topic (to help others having similar questions!).

I’m not sure I understand what you mean by “turning off data correlation … to do training”. Many methods exploit correlation between input features (input variables) to increase separation. So one way to “turn it off” would be to use a methods such as LDA.

Cheers,
Kim

Dear Kim,

Thanks for the reply! I got it.

Best,
Justine