RooHistPdf and derived variables

AndreasAlbert · January 25, 2021, 6:59pm

Hi,

I’d like to ask for help to better understand how to construct RooHistPdf in a specific case. This question might be similar to [1], but I was not able to translate the solution from there to my specific problem.

I have a TH1 describing some distribution of a derived quantity (e.g. “pt”). In my implementation, the underlying real free parameters are not pt, but px and py. I would like to be able to define a PDF based on my TH1 and have it depend on pt. Naively, I tried the following implementation using RooDataHist as an intermediate step (in python):

import ROOT
ROOT.gSystem.Load('libRooFit')
from ROOT import RooFormulaVar, RooRealVar, RooDataHist, RooHistPdf, RooArgList

# Dummy histogram
th1 = ROOT.TH1D("test","test",10,0,100)
th1.Fill(5, 3)
th1.Fill(15, 1)
th1.Fill(25, 0.1)

# Elementary variables
px = RooRealVar("px","px",10,0,100)
py = RooRealVar("py","py",10,0,100)

# Derived variables
pt = RooFormulaVar("pt","sqrt(px**2 + py**2)", RooArgList(px,py))

# This does not work.
rdh = RooDataHist("test","test", RooArgList(pt), th1)

# Ultimately, I want to make a RooHistPdf
pdf = RooHistPdf("pdf",
                "pdf",
                RooArgList(pt),
                RooArgList(pt),
                rdh
            )

This fails, with the following error message:

[#0] ERROR:InputArguments -- RooAbsDataStore::initialize(test): Data set cannot contain non-fundamental types, ignoring pt

The message is very clear, so I have no doubts what I am doing wrong, but I fail to figure out what I should be doing instead. What better ways are there to achieve what I am trying to do?

The final goal of my implementation is to be able to minimize some negative log-likelihood function by finding optimal values of (px,py). This PDF would be a part of the likelihood. I cannot just switch from (px,py)->pt as the basic variable because other parts of the likelihood depend on (px,py) separately, so I would run into the same problem in reverse.

Any advice would be appreciated!

Thanks

Andreas

[1] RooFit, formulas in RooHistPdf

jonas · February 4, 2021, 2:53pm

Dear Andreas,

I think the approach in the post by Wouter that you linked is also applicable here.

I took your Python code, converted it to C++ for my convenience and implemented that approach. The source is attached, with the command to compile it in the first line:

RooHistPdf_and_derived_variables_1.cc (1.9 KB)

One thing that’s different compared to your example is that I invented some values for px and py to fill the dataset.

Let me know if this helps of if you have further questions!

Cheers,
Jonas

AndreasAlbert · February 12, 2021, 1:19pm

Hi Jonas,

thank you for providing this nice implementation. I think it solves a problem that is different from the one I seem to be having, though. Specifically, you start out by defining a dataset that contains a number of different px, py values and then define pt as a new derived column. You succeed in ending up with a pt-dependent pdf (good), but I do not immediately see how one would evaluate the pdf given new arbitrary values of px, py. I am trying to get to a situation where I can simply modify the values of the px and py variables, and have pt, as well as the pdf update automatically.

In the meantime, someone pointed out an alternative solution to me privately that I think solves my problem more directly:

import ROOT
ROOT.gSystem.Load('libRooFit')
from ROOT import RooFormulaVar, RooRealVar, RooDataHist, RooHistPdf, RooArgList

# Dummy histogram
th1 = ROOT.TH1D("test","test",10,0,100)
th1.Fill(5, 3)
th1.Fill(15, 1)
th1.Fill(25, 0.1)

# Elementary variables
px = RooRealVar("px","px",10,0,100)
py = RooRealVar("py","py",10,0,100)

# Derived variables
pt = RooFormulaVar("pt","sqrt(px**2 + py**2)", RooArgList(px,py))

# Create a completely useless dummy variable for RooDataHist
# We will not use this dummy variable again
dummy = RooRealVar("dummy","dummy",10,0,100)
rdh = RooDataHist("test","test", RooArgList(dummy), th1)

# Use the "pdfObs" and "histObs" arguments correctly
# pdf will be evaluated at "pt", while "dummy" is ignored
pdf = RooHistPdf("pdf",
                "pdf",
                RooArgList(pt),
                RooArgList(dummy),
                rdh
            )

# Demonstration:
for length_1d in [2,10,18]:
    px.setVal(length_1d)
    py.setVal(length_1d)
    print(pdf.getValV())

The key here is the correct use of the “pdfObs” and “histObs” arguments to the RooHistPdf constructor, which allows one to disentangle the variable used to construct the RooDataHist from the one used to evaluate the RooHistPdf. It seems to me that the constructor was written this way to solve exactly the problem I was having, I just did not understand that before.

Thanks again

Andreas

AndreasAlbert · February 12, 2021, 1:24pm

(Also, to clarify slightly the starting situation: The idea is not that I would have access to vectors of px, py values for the initiation of the data set. Rather than that, I would have a premade histogram lying around somewhere that I would load)

jonas · February 12, 2021, 8:36pm

Okay, thanks a lot for following up!