Model a PDF (probability density) from data

I’ve read TMVA documentation, slides and looked at the tutorials, but couldn’t find an example how to model a pdf.

I have background (mostly uniform in the detector, but with strange regions that I don’t want to model parametrically). On top of that known background (from a run without a source) there is a calibration signal that I want to model parametrically (that’s why I want background pdf to be calculated fast). I read that method of k nearest neighbours is fast enough, but couldn’t find an example how to get data from there.

There is a method

Double_t TMVA::MethodKNN::GetMvaValue(Double_t *err = 0, Double_t *errUpper = 0),

but I’m not sure what its output is (it’s not written in its documentation).

There is also a PDEFoam method GetCellValue(), but I don’t understand its arguments and return value.
If my background decreases exponentially, will PDEFoam model it well? I read that it models pdf by constant cells.

Could someone please give an example how to initialize, train TMVA and get PDF value at the given point (maybe just give some methods names)? Maybe there is something other in ROOT for that? Thank you.

I guess @moneta can help you.

1 Like

I hope this topic doesn’t get closed, because I still need the way to do that. Thanks.

@moneta ping

1 Like

Hi,

If you want to model a PDF given some data, you can use either an histogram (TH1 class) or a Kernel density estimator, see TKDE class (ROOT: TKDE Class Reference ).
You don’t need to use TMVA for this

Cheers

Lorenzo

Thanks for the reply, Lorenzo. Unfortunately I need to model a 3-dimensional pdf. Maybe I don’t need to get its value, though, because in fact I just want to use it in a fit with RooFit.
TH1 and TKDE don’t suit for 3-dimensional data.
I read in tutorials that TMVA is good for such non-parametric fits, or what should I use?

Hi,

For the multi-dim case you can use the multi-dimensional KDE class that is available in RooFit,
see ROOT: RooNDKeysPdf Class Reference
In TMVA, there are some methods that rely to estimate the multi-dimensional likelihood density but those are designed to use for classification. It is not trevial to use them just for estimating the density.
Some of these include PDEFoam or KNN (k nearest neighbour)

Lorenzo

Thank you. Now I understand. I’ll try that. I think this answers the question.