Different application results from identical back-to-back runs

petrakou · June 5, 2020, 8:49am

Hi dear ROOTers,

I train a classification BDTG and then apply it to a number of different datasets, each with about 300 events. (It has to be trained and applied separately for each of them.)

While doing so, I noticed that sometimes the application gives different results on the same dataset. I.e. I run the application script, get the classification results, exit to command line, run again the same script with no changes, get different classification results.

If I continue, I’ll always get either one result or the other (i.e. no third set of results).

Also, almost all of the times that this happens, one of the two classifications looks reasonable and the other will classify all events close to one of the two extremes (i.e. all signal or all background).

Do you have something to advise?
(TMVA v4.2.0)

RongkunWang · June 5, 2020, 10:49am

is the random seed set to a fixed number?

petrakou · June 5, 2020, 10:57am

Let me check. But I wouldn’t expect the seed to affect the application, certainly not to such an extent.

(Let me clarify that the training is not re-run between the identical application runs that I described.)

RongkunWang · June 5, 2020, 11:29am

I think when separating the training/testing sample the seed matters. also if you use BaggedBoosting or random dropping of features

petrakou · June 5, 2020, 1:30pm

I think that what I wrote has not been clearly understood - I am not talking about testing. The problem appears in the application of the training.

petrakou · June 8, 2020, 10:15am

@moneta @kialbert I earnestly hope that you don’t mind the tagging.

Unfortunately I feel that this is a serious problem for my analysis, as it implies that the application results can’t be trusted… So I’m trying to find possible hints that could lead somewhere. Any suggestions will be very welcome.

moneta · June 8, 2020, 2:01pm

Hi,

What you are seeing looks really weird. Which ROOT version are you using ?
I would need to reproduce it to debug and understand. Can you please post your application script, your trained model as XML file and your data sample you are using to evaluate the model that gives the two different results

Lorenzo

petrakou · June 17, 2020, 3:36pm

As an update: With Lorenzo’s help we reproduced the validation results but not the problem.

At this point I think that this looks more like a Linux than a Root issue. But even if there is a memory problem somewhere, I keep wondering in which way it could have affected the validation…
Any ideas are welcome.

dastudillo · June 18, 2020, 5:25am

A couple of things to try: what if you do ‘fresh’ runs the application, i.e. close the terminal (I suppose you run it in a terminal) between runs, or make the runs on different terminals (simultaneously or not)? Do you still get alternate results, or always the same? The instability could be due to a problem in the code (or even the data) causing a memory corruption, crash, etc somewhere, that for some reason is not being ‘reported’.