What is the use of weight when adding Trees

haolinli · June 21, 2017, 10:37pm

Dear experts,

I have a question about what is the principle to set the weight of the training Trees in TMVA.
In TMVA we have factory->AddBackgroundTree(*Tree, weight).
I was woundering if the following logic and using is correct:

suppose we have ttbar and multijet background, and the cross-sections before passing thourgh the MVA are A and B respectively. Also I assume the number of entries in each background trees are equal, then I can use:

factory->AddBackgroundTree(ttbar, A);
factory->AddBackgroundTree(multijet, B);

If the above is correct, then is the following statement is correct:

with calling:
factory->PrepareTrainingAndTestTree("…:nTrain_Background=1000:…");
when training, the number of ttbar events used is 1000A/(A+B) and the number of multijet events used is 1000B/(A+B).

Really appreciate if anyone could help me with the problem.

-Haolin

kialbert · June 22, 2017, 11:39am

Hi!

The weights of the trees will scale all event weights in that tree uniformly. However, the splitting of events does not take weights into account, only the raw number.

This means your first assumption is correct while the second one isn’t. Assuming you use random splitting then ~500 raw ttbar events will be selected and ~500 raw multijet events.

Cheers,
Kim

haolinli · June 22, 2017, 2:17pm

Thanks a lot. There is another question about the output of the TMVA, there is a curve showing the cut efficiency of the background vs the cuts on MVA output. How do the different backgrounds scale on this curve?

For example I apply the trained MVA onto the test sample of ttbar and multijet background and obtainted the efficiency “a” and “b” respectively for a certain value of MVA cuts. Is the efficiency “e” on the curve corresponding to the same MVA cut is obtained(or at least close to) by:

(a * A+b* B)/(A+B) ?

I did some test, it seems that the above equation does not hold, or I made some mistakes.

Thanks,
Haolin

kialbert · June 22, 2017, 5:13pm

We will select only the events from a and b that pass the selection cut, so something along the lines of

(a_pass / a_tot)*A + (b_pass / b_tot)*B

should be more accurate.

haolinli · June 22, 2017, 7:01pm

Hi,

I am confused, are A and B cross-section of ttbar and multijet? If they are how could it be normalized?

I was think is as:
((a_pass/a_tot)*A + (b_pass/ b_tot)*B )/(A+B)

kialbert · June 23, 2017, 9:36am

Yes, indeed this is what I should have written