Adding "PDFInterpol=logspline" to TMVA::Factory?

jwruss · November 24, 2020, 11:47pm

Without knowing definitively the underlying distribution of a TMVA response, I have been considering fitting the PDF histograms of the response using PDFInterpol=KDE as opposed to one of the PDFInterpol=Spline{0-5} options. However, one of my colleagues has argued that the KDE would produce too small tails when considering the Rarity/CDF of the fitted PDF. It looks like this dilemma has been posted elsewhere, where it was suggested that a logspline could yield the desired behavior of bigger tails. Figure 2 in these lecture notes have raised my hopes that a logspline fit to the histogram PDFs may be worthwhile to pursue.

Has this been considered before? Could it be implemented?

oshadura · November 25, 2020, 10:19am

@moneta @swunsch could you help here please? Thanks in advance!

jwruss · November 29, 2020, 12:21am

As an interim solution, I was looking into rebuilding ROOT and enabling R support bindings to be able to use R’s logspline library. The way I thought I would do this is I would refer to the evaluated response values within the TrainTree in the file written for the single TMVA::Factory object that I create. I would use the response values corresponding to classID==0 to create a vector “x” which I would then feed into R to get a response using fit <- logspline(x).

I found this method won’t work, though, because it seems the TrainTree object within the file has only saved the response values, their probabilities, their classID, and ther className for the first method booked. With a single Factory object I book multiple TMVA::DataLoader objects of type kFisher to try different combinations of signal and background samples, in order to use the same overall data sample for each DataLoader object. ATDirectoryFile object is created for each training within the Method_Fisher TDirectoryFile object though. Could additional leafs for each training be added to the TrainTree and TestTree?

jwruss · February 4, 2022, 8:58pm

I was eventually able to find a workaround. Under the TrainTree and TestTree object, I filled the vector “x” with Fisher discriminant values coming from events containing “Signal” under the “className” TLeaf.

Axel · February 7, 2022, 8:47am

Thanks, @jwruss and apologies for us not providing an answer. And even more thanks for sharing your solution here, publicly: much appreciated!