RDataFrame snapshots and RooFit deterministic behaviour

RENATO_QUAGLIANI · October 27, 2020, 6:32pm

Hi experts,
I have a question/remark on the procedure about making RooDataSets given a RDataFrame snapshot TTree.

We observed in our analysis some strange behaviour, which we figured out being related to how Snapshots are done from RDataFrame. As we understood, the Snapshot in MT mode enables shuffling of entries order in the final TTree. If we use the shuffled/unshuffled TTree and we create a DataSet to fit later for it, we do observe different final results.
The fit becomes completely deterministic when exactly the same TTree index order is used.
Therefore the question:
Assuming a fitting routine takes an Input TTree, apply a cut to it and make a DataSet out of it, is it a known behaviour that the order in which a RooDataSet get filled can modify the final results?
I do expect this to happen because of the Strategy used :

Where the strategies chunks the data in n-equal slots, but if one shuffles the dataset entries order, the chunks division becomes not deterministic.

I admittely didn’t know that the fit results can depend on the order in which a dataset is created, but i wonder if there are any more robust and deterministic approach to use here to always get the same result even with a shuffled dataset order filling scheme.

Thanks in advance ,
Renato

jalopezg · October 27, 2020, 8:12pm

Hello Renato,

I am involving @StephanH, as he might know the answer to this question.

Cheers,
J.

eguiraud · November 2, 2020, 4:23pm

This is one for @moneta .

moneta · November 6, 2020, 4:10pm

Hi

The results should differ only within some numerical error, and this difference should be mush smaller that the parameter error reported.
If this is not the case, it is am indication the fit is unstable and it is maybe due to some other problematic in the way the fitting model is built.

Lorenzo

RENATO_QUAGLIANI · November 7, 2020, 10:25am

Hi @moneta the funny thing of the story. Is that even the minNLL value is the same but in one case covqual is 1/2 and in another is 3. We have our fitter trying strategies in repetition until it converge and we give up after 4 trials. Toys fails in 5 percent of the cases and our fits uses gaussian constraints. Do you think it’s reasonable that we do observe this behaviour?

RENATO_QUAGLIANI · November 7, 2020, 10:26am

Also we use Ipathya functions for our models which internally uses numerical integrators, maybe it can be also linked to this?

moneta · November 9, 2020, 8:59am

Hi,
I think this can happen what you observe. One thing helpful to mitigate is using the RooFit option RooFit::Offset(true) when fitting, which improves the nll calculation in RooFit.

Lorenzo

system · November 23, 2020, 8:59am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.