What is the use of weight when adding Trees

Dear experts,

I have a question about what is the principle to set the weight of the training Trees in TMVA.
In TMVA we have factory->AddBackgroundTree(*Tree, weight).
I was woundering if the following logic and using is correct:

suppose we have ttbar and multijet background, and the cross-sections before passing thourgh the MVA are A and B respectively. Also I assume the number of entries in each background trees are equal, then I can use:

factory->AddBackgroundTree(ttbar, A);
factory->AddBackgroundTree(multijet, B);

If the above is correct, then is the following statement is correct:

with calling:
factory->PrepareTrainingAndTestTree("…:nTrain_Background=1000:…");
when training, the number of ttbar events used is 1000A/(A+B) and the number of multijet events used is 1000B/(A+B).

Really appreciate if anyone could help me with the problem.

-Haolin

1 Like

Hi!

The weights of the trees will scale all event weights in that tree uniformly. However, the splitting of events does not take weights into account, only the raw number.

This means your first assumption is correct while the second one isn’t. Assuming you use random splitting then ~500 raw ttbar events will be selected and ~500 raw multijet events.

Cheers,
Kim

Thanks a lot. There is another question about the output of the TMVA, there is a curve showing the cut efficiency of the background vs the cuts on MVA output. How do the different backgrounds scale on this curve?

For example I apply the trained MVA onto the test sample of ttbar and multijet background and obtainted the efficiency β€œa” and β€œb” respectively for a certain value of MVA cuts. Is the efficiency β€œe” on the curve corresponding to the same MVA cut is obtained(or at least close to) by:

(a * A+b* B)/(A+B) ?

I did some test, it seems that the above equation does not hold, or I made some mistakes.

Thanks,
Haolin

We will select only the events from a and b that pass the selection cut, so something along the lines of

(a_pass / a_tot)*A + (b_pass / b_tot)*B

should be more accurate.

Hi,

I am confused, are A and B cross-section of ttbar and multijet? If they are how could it be normalized?

I was think is as:
((a_pass/a_tot)*A + (b_pass/ b_tot)*B )/(A+B)

Yes, indeed this is what I should have written :slight_smile: