I had run with TMVA BDT with the configure as
NTrees=800:MinNodeSize=2.5%:MaxDepth=4:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20
I got different BDT response shape from two OS.
May I ask usually its normal? And how can we estimate uncertainty?
Small differences are definitely possible, since there are many things that can be different between OSs, for example, math function implementations in the standard C library, rounding mode for floating point numbers, the system compiler (GCC / Clang), default compilation flags (in particular optimization options), etc. However, since the background test sample looks different in the two plots, I think the difference could be due something like using a different random seed somewhere for the calculations, for example, for selecting what is part of the test and training samples. Cheers,
As @amadio is saying, SplitMode=Random activates random splitting of the data. The seed of the splitting is controlled with SplitSeed=100. (The default is 100).
To determine whether it’s the random splitting causing the difference I would ask you to rerun the training with SplitMode=Alternate or SplitMode=Block. These two modes are not recommended for trainings used in production, but should be good for determining the cause here.
The BDT training is sensitive to the random seed yes. In application the results should be identical barring subtle differences in the floating-point implementation etc.
To determine the uncertainty of the BDT response one can use cross validation. The idea is then to train the classifier repeatedly using (slightly) different training and test samples. This will give several reasonably independent distributions of the response with which one can derive some measures of uncertainty.
We tested the Splitmode as you suggested as the attached file Left is Linux system, Right is OS X system.
Only the Random mode shown difference. Will it due to number of events not enough so the statistic fluctuation is high?
I’m not sure I exactly understand the question but basically that there is variance across trainings on different machines is not a major issue. As long as the application phase produces identical values across different machines that is enough (train once and then you should be able to trust that output).
If I understand your question correctly, then yes the differences should go away as your data sample size is increased.
Maybe @moneta wants to add something to this discussion.