BaggedSampleFraction vs GradBaggingFraction

afray · June 21, 2018, 3:50pm

Hi,

I’ve a fairly simple question. I’m currently training a BDT with the following settings:
“NTrees=200:MaxDepth=4:nCuts=20:UseBaggedBoost=True:Shrinkage=0.1:BoostType=Grad:GradBaggingFraction=0.5:IgnoreNegWeightsInTraining=True”

I noticed in the TMVA manual that GradBaggingFraction is deprecated, and I should instead switch to BaggedSampleFraction. In doing so, however, there’s a noticeable increase in the amount of overtraining I see for my signal.

For this to happen, I imagine there must be some fairly important differences between these methods. What are they exactly? Sorry if this has already been asked, I’ve not managed to find much information on the topic.

kialbert · June 21, 2018, 5:57pm

Hi,

That’s weird, they should be exactly the same. (Really, the two options set the same internal variable). Might it be that something else changed between your runs?

If it is indeed persistent, a small script that reproduces the error would be helpful

Cheers,
Kim

afray · June 22, 2018, 12:59pm

Hi Kim,

So in trying to construct a simple script that can demonstrate the issue, I’ve managed to prove to myself that the methods do in fact give the same output. So all is well on the TMVA end - I must have accidentally changed another setting.

Sorry for wasting your time. I suppose while I’m here I might as well ask - if the methods give the same results, why was baggedsamplefraction written to replace gradbaggingfraction?

Cheers,
Antony

kialbert · June 22, 2018, 2:31pm

Glad that you worked your problem out

One of the options (I think it was GradBaggingFraction) was implemented first, then there was a spring cleaning done to make the naming clearer. The old option was kept for backwards compatibility.

Cheers,
Kim