MC@NLO and Boosted Decision Trees

JohannesE · March 26, 2008, 9:48am

Greetings,

I am using TMVA 3.8.14 for separating H->WW->mumu decays from ttbar decays. The ttbar sample is simulated with MC@NLO, a MC generator that gives “signed” events (weighted with ±1)
To obtain correct distributions, events must be summed up respecting that sign.

Now, if i try to give the weighted events (essentially using
factory->SetWeightExpression(“eventWeightMCatNLO”)) this seems to work fine with for example the MLP method, but it breaks the boosted decision trees. Even small amounts of negative weight (for example (eventWeightMCatNLO+0.9)) break the algorithm and cause almost all of the trees generated to be degenerate (no splits).

Another approach i tried is to redistribute the events: The negative signal events added to the background, and the negative background events added to the signal. Yet this fails because it seems impossible to specify such a condition without copying the tree, which is quite prohibitive because of size. (total 12GB)

Is there a solution out there, or would the only way be to fix the BDT code?

Thanks,
Johannes Ebke

hvoss · March 26, 2008, 1:43pm

Dear Johannes,

I’m a bit surprised about your problem with the BDTs and need to understand this. Unfortunatly I never had a MC with negative weights for testing but in principle, once you don’t specify small number of minimual events per leaf node (the default might be fairly small, hence please try to set nEventsMin=100 or 500 to try).

What happens is that the tree splits nodes as long as the number of UNWEIGHTED events is at least nEventsMin. Now if you happen to choose this number too small, then your node might cover a very small region in phase space where the MC might give even on average negative number of events and this cannot be handled in a reasonable way. But also from a physics point of veiw it doesn’t make sens, and hence the nEventsMin should be large enough such that this never happens. Well, that was my idea so far.

HOWEVER, now you say that your trees don’t even split but are completely degenerate? Right from the start or only after it once hit a node with negative overall weight? Anyway, I’d be very interested to test this out myself. Do you think it would be possible for you to provide me with a Test Tree and the TMVAnalysis.C ?? (Helge.Voss@cern.ch)

                  Ciao,

                      Helge

JohannesE · March 26, 2008, 2:36pm

I have now tried it out with a nEventsMin=100 and 500, and the problem persists. In case of nEventsMin=500 the first 11 decision trees are nondegenerate - perhaps this is a problem with assumptions in the boosting algorithm?

As soon as I have reduced the ntuples to some manageable size, i’ll put them and the analysis on afs.

Thanks,
Johannes

hvoss · March 26, 2008, 3:59pm

Thanks. Yes you are right that must be s.th. with the boosting that I don’t see yet. I’ll have a look once I got your data samples.

            ciao,

               Helge

JohannesE · March 27, 2008, 2:08pm

The files and an example analysis are now in /afs/cern.ch/user/e/ebke/public/bdtnw
Thanks for any help.
Johannes

hvoss · March 30, 2008, 9:23pm

Hi Johannes,

it turned out quite a bit more tricky than I though with those negative weight events, as they tended to get boosted more and more. This is now prevented, although I cannot say it is a 100% correct treatement of negative weights. (NOTE: all you loose is performance when not treating them correctly, you DO NOT get any kind of wrong selection or bias or alike) Please try again (using the CVS Head version or wait for the next release which should come very soon) and see the result. Let me know if you find any other problems.