Pythonization of RooDataSetHelper with weights not working?

Dear experts,

Here is a reproducer of the problem. ( basically a tutorial script with the weight var for RDataFrame → RooDataSet making)

import ROOT
import math
 
 
# Set up
# ------------------------
 
# We create an RDataFrame with two columns filled with 2 million random numbers.
d = ROOT.RDataFrame(2000000)
dd = d.Define("x", "gRandom->Uniform(-5.,  5.)").Define("y", "gRandom->Gaus(1., 3.)").Define("weight", "gRandom->Uniform(0.1, 1.0)")

 
 
# We create RooFit variables that will represent the dataset.
x = ROOT.RooRealVar("x", "x", -5.0, 5.0)
y = ROOT.RooRealVar("y", "y", -50.0, 50.0)
x.setBins(10)
y.setBins(20) 
 
# Booking the creation of RooDataSet / RooDataHist in RDataFrame
# ----------------------------------------------------------------
 
# Method 1:
# ---------
# We directly book the RooDataSetHelper action.
# We need to pass
# - the RDataFrame column types as template parameters
# - the constructor arguments for RooDataSet (they follow the same syntax as the usual RooDataSet constructors)
# - the column names that RDataFrame should fill into the dataset
#
# NOTE: RDataFrame columns are matched to RooFit variables by position, *not by name*!
rooDataSet = dd.Book(
    ROOT.std.move(ROOT.RooDataSetHelper("dataset", "Title of dataset", ROOT.RooArgSet(x, y))), ("x", "y", "weight")
)

def printData(data):
    print("")
    data.Print()
    for i in range(min(data.numEntries(), 20)):
        print(
            "("
            + ", ".join(["{0:8.3f}".format(var.getVal()) for var in data.get(i)])
            + ", )  weight={0:10.3f}".format(data.weight())
        )
 
    print("mean(x) = {0:.3f}".format(data.mean(x)) + "\tsigma(x) = {0:.3f}".format(math.sqrt(data.moment(x, 2.0))))
    print("mean(y) = {0:.3f}".format(data.mean(y)) + "\tsigma(y) = {0:.3f}\n".format(math.sqrt(data.moment(y, 2.0))))
 
 
printData(rooDataSet)

I added to the tutorial here ROOT: tutorials/roofit/rf408_RDataFrameToRooFit.py File Reference the case for weighted dataset construction and the code segfault.

rooDataSet = dd.Book(
    ROOT.std.move(ROOT.RooDataSetHelper("dataset", "Title of dataset", ROOT.RooArgSet(x, y))), ("x", "y", "weight")
)

following what suggested in ROOT: RooAbsDataHelper< DataSet_t > Class Template Reference but apparently the Book call is unhappy when Argset content do not match columns length in python.
Is there a pythonization which is not correctly loaded here?

Thanks in advance renato


ROOT Version: 6.34.04
Built for linuxx8664gcc on Feb 26 2025, 15:54:54
From tags/6-34-04@6-34-04

I think @jonas can help here

Thanks for the report!

Indeed, the documentation was wrong, but I’m fixing this:

If you create a weighted RooDataSet, you need to explicitly specify the WeightVar("weight") in the constructor, so that the weighted RooDataSet can be constructed correctly.

Let me know if it works now!

Cheers,
Jonas

2 Likes

Thanks a lot for the hint, it is much clearer now which if i got right imply that you forward the RooCmdArg.

With this

rooDataSet = dd.Book(
    ROOT.std.move(ROOT.RooDataSetHelper("dataset", "Tit\
le of dataset", ROOT.RooArgSet(x, y), ROOT.RooFit.Weigh\
tVar("weight"))), ("x", "y", "weight")
)

It works , however i guess that for the weightvar one should match always the column name to pass at the end with the WeightVar argument value, correct?

No, the variable names don’t need to match. As the docs say, the variables are mapped by position and not by name. And if there is one more variable taken from the RDF than in the RooArgSet that specifies the RooFit dataset, it is assumed to be the weight column.

Thanks @jonas

Just to be sure :
Is what i posted before correct then?

rooDataSet = dd.Book(
    ROOT.std.move(ROOT.RooDataSetHelper("dataset", "Tit\
le of dataset", ROOT.RooArgSet(x, y), ROOT.RooFit.Weigh\
tVar("weight"))), ("x", "y", "weight")
)

i see my dataset plotted with weights later on so i assumed it is correct, but are you saying that

rooDataSet = dd.Book(
    ROOT.std.move(ROOT.RooDataSetHelper("dataset", "Tit\
le of dataset", ROOT.RooArgSet(x, y), ROOT.RooFit.Weigh\
tVar("weight"))), ("x", "y", "weightcolumnname")
)

would work even if “weight” is not a column in rdataframe? it just remap it to “weight” ?
Or do i need to supply a RooRealVar weight in the RooArgSet instead?

Yes, what you posted before is correct.

Sorry, I got your follow up question wrong then! What I said was that the names of the RooRealVars in the RooDataSet don’t have the match the names of the columns you take from the RDataFrame. But the names of the columns you take from the RDataFrame need to correspond to actual columns in the RDataFrame.

thanks a lot for the clarification, this was clear , my doubt was mostly about the

RooArgSet( var1,var2), WeightVar("xxx") )), [ "col1","col2","colweight"])

and wether in WeightVar("xxx") , xxx musth match the column name or “colweight” is auto-interpreted as Weight Var and assigned to “xxx” label.

and wether in WeightVar("xxx") , xxx musth match the column name or “colweight” is auto-interpreted as Weight Var and assigned to “xxx” label.

No, the names don’t have to mach. I think the documentation tries to be clear about this with this warning:

Variables in the dataset and columns in RDataFrame are matched by position, not by name. This enables the easy exchanging of columns that should be filled into the dataset. 

You have a suggestion to make this clearer? Sorry for the confusion! :slight_smile:

1 Like

Thanks a lot, i think the best is to actually have the example with weights in the tutorials, as i suspect this to be quite useful to have as reference. Anyway all clear now. Thank you