BaggedSampleFraction vs GradBaggingFraction

Hi,

I’ve a fairly simple question. I’m currently training a BDT with the following settings:
“NTrees=200:MaxDepth=4:nCuts=20:UseBaggedBoost=True:Shrinkage=0.1:BoostType=Grad:GradBaggingFraction=0.5:IgnoreNegWeightsInTraining=True”

I noticed in the TMVA manual that GradBaggingFraction is deprecated, and I should instead switch to BaggedSampleFraction. In doing so, however, there’s a noticeable increase in the amount of overtraining I see for my signal.

For this to happen, I imagine there must be some fairly important differences between these methods. What are they exactly? Sorry if this has already been asked, I’ve not managed to find much information on the topic.

Hi,

That’s weird, they should be exactly the same. (Really, the two options set the same internal variable). Might it be that something else changed between your runs?

If it is indeed persistent, a small script that reproduces the error would be helpful :slight_smile:

Cheers,
Kim

Hi Kim,

So in trying to construct a simple script that can demonstrate the issue, I’ve managed to prove to myself that the methods do in fact give the same output. So all is well on the TMVA end - I must have accidentally changed another setting.

Sorry for wasting your time. I suppose while I’m here I might as well ask - if the methods give the same results, why was baggedsamplefraction written to replace gradbaggingfraction?

Cheers,
Antony

Glad that you worked your problem out :slight_smile:

One of the options (I think it was GradBaggingFraction) was implemented first, then there was a spring cleaning done to make the naming clearer. The old option was kept for backwards compatibility.

Cheers,
Kim