Can I use TMVA for few events?

Li_Huang · April 3, 2016, 5:10pm

Hi,
Can I use TMVA for few events? for example 100 events?Can I simulate many events to avoid the statistic error?

If this will take much error, can I simulate many events , for example 10^8 events. Using BDT as an example. Then using this events to train BDT and using the result ( the tree ) to predict the 100 event?

In details,
If the LHC will produce around 100 signal and 100 background, can I just simulate 100 for each to use BDT to distinguish the signal and background?
If I can’t , Can I simulate 10^8 signal and 10^8 background, then using BDT? This result is valid to distinguish the experiment data?
I am worring that for few events, even I simulate many events to do BDT, the statistical error is big

hvoss · April 19, 2016, 9:29am

Hi,

sorry for the late answer I typically don’t read ‘root-talk’ and there seem no option to get automatically
noted if there is a TMVA related question.

I guess you’ve solved your question by now… but anyway:

of course… you have to specify if you want to use BDT’s (or any multivariate classifier) to be ‘trained’ on 100 events (i.e. a small number) or if you want to apply a trained BDT (trained with many more events, i.e. your 10^8 ) on such small nubmer of events.

You need to understand what an MVA algorithm does first ! (Well, same as cut cuting in the end) … in any case.
If you train your MVA on a small number of events, you get a MVA classifier that is not necessarily the best as you obviously have not enough information (events) available to do better. That’s some kind of 'statistical fluctuation) … of course, the more events for training you have, the better. (i.e you get better classifiers, i.e. one that would give you on a data sample ON AVERAGE the best performance ) (of course, assuming you don’t have 'systematic differences between you training events and your data)

THEN of course, if you use the classifier on your ‘data’ … … and again, of course you have statistical fluctuations on the the number of selected events (being them selected via a BDT algorithm, simple cuts or whatever) … guess that’s simply ‘statistics’ and not much you can do about it. What you need to understand though is: it’s got nothing to do with BDT’s etc… its the same for any kind of selection algorithm you will apply.

helge

Li_Huang · May 4, 2016, 5:57pm

[quote=“hvoss”]Hi,

sorry for the late answer I typically don’t read ‘root-talk’ and there seem no option to get automatically
noted if there is a TMVA related question.

I guess you’ve solved your question by now… but anyway:

of course… you have to specify if you want to use BDT’s (or any multivariate classifier) to be ‘trained’ on 100 events (i.e. a small number) or if you want to apply a trained BDT (trained with many more events, i.e. your 10^8 ) on such small nubmer of events.

You need to understand what an MVA algorithm does first ! (Well, same as cut cuting in the end) … in any case.
If you train your MVA on a small number of events, you get a MVA classifier that is not necessarily the best as you obviously have not enough information (events) available to do better. That’s some kind of 'statistical fluctuation) … of course, the more events for training you have, the better. (i.e you get better classifiers, i.e. one that would give you on a data sample ON AVERAGE the best performance ) (of course, assuming you don’t have 'systematic differences between you training events and your data)

THEN of course, if you use the classifier on your ‘data’ … … and again, of course you have statistical fluctuations on the the number of selected events (being them selected via a BDT algorithm, simple cuts or whatever) … guess that’s simply ‘statistics’ and not much you can do about it. What you need to understand though is: it’s got nothing to do with BDT’s etc… its the same for any kind of selection algorithm you will apply.

helge[/quote]

Hi Helge,
Sorry that I don’t log in for a long time. I agree with you that it’s a statistic fluctuation problem and there is nothing related to BDT or other classify algorithm. I need to take care of this fluctuation. Thanks a lot!

Best,
Li