Filling RooDataSet Event-by-Event

Hi everyone. I’m attempting to fill a RooDataSet in an event loop from some derived quantities in an analysis. I’m not sure how to make the proper assignment between, say, a float variable in my script and the RooRealVar to which it corresponds in the RooArgSet which comprises the RooDataSet. For example, the following does not work:

// set up the vars, set, and dataset
TString ds_vars[] = { "radius","product" };
int n_vars = (int) sizeof(ds_vars)/sizeof(TString);

auto vars = new RooArgSet("obs_vars"); 
for(int i=0;i<n_vars;i++){
  auto var = new RooRealVar(ds_vars[i].Data(),ds_vars[i].Data(),0);
  vars->add(*var);
}
auto ds = new RooDataSet("obs_data","obs_data",*vars);

float r, prod; // internal variables

// ....
// assume usual TTree is loaded from a TFile here
// ....


for (int i=0; i<tree->GetEntries(); i++){
  tree->GetEntry(i);      
  
  // calculate some derived quantities
  // (assume x and y are tree variables here)
  r = sqrt( x*x + y*y );
  prod = x*y;

  // assign values of the RooRealVars to the values calculated here
  // this does *not* work.. what is the correct way here?
  vars->find("radius") = r;
  vars->find("product") = prod;
         
  // add everything in the 'vars' ArgSet to the RooDataSet
  ds->add(*vars);          
}

Thanks in advance!

Hi,

Instead of vars->find("radius") = r;which is not allowed, you should do for example:
vars->setRealValue("radius",r);

Lorenzo

thanks for the help! could you clarify how/why this is different from the example given on page 100 of the user’s manual? that example seems to suggest the assignment of a RooRealVar to a numerical value would be allowed.

Hi,

You can use also the assignment operator of RooRealVar, but then in that case you should do as following:

RooRealVar &  radius = *dynamic_cast<RooRealVar*>(vars->find("radius"));
radius = r;

Lorenzo

thanks again! as a final follow-up, I’m now trying to do an analogous thing in python: saving the values in a numpy array as variables in a RooDataSet to be used later. The example code attached here does not quite work, with the following error:

Traceback (most recent call last):
  File "test_dataset.py", line 45, in <module>
    main()
  File "test_dataset.py", line 41, in main
    testdataset()
  File "test_dataset.py", line 24, in testdataset
    ds = ROOT.RooDataSet("obs_data","unbinned data",obsvars,0);
TypeError: none of the 8 overloaded methods succeeded.

I suppose i have the constructor for the RooDataSet wrong here, but I’m not sure how. Thanks again.

test_dataset.py (1.1 KB)

Hi,
In Python you can create directly a RooDataSet from a Numpy array, see the tutorial
rf409_NumPyPandasToRooFit.py.

Lorenzo

Thanks, Lorenzo. This must have been a recently added function.

Could you still help me understand what is wrong with my example? I want to understand what I’m doing wrong there…

I think is a problem in PyROOT finding sometime the right overload especially if there are default arguments. @etejedor might know more in this

Hello,

The issue is with the RooRealVars, you create them inside a loop and they don’t survive it because the avar variable is continuously rebound, that causes a segfault afterwards.

You need to keep them alive e.g. by storing them in a list:

    l = []
    for i in range(n_vars):
        avar = ROOT.RooRealVar(obs_names[i],obs_names[i],0.)
        l.append(avar)
        obsvars.add(avar)

Perhaps @jonas know if there is some pythonization planned to protect against this.

1 Like

Hi!

@etejedor is right. You need to keep the RooRealVars alive, in some other structure, as the RooArgSet is only storing non-owning pointers. This won’t change with any Pythonization in the future to not overcomplicate the RooArgSet behavior.

By the way, in your C++ version, it only worked because you are creating them with the new operator so they survive the whole script, causing a memory leak :slight_smile: So this isn’t a Python problem only. If you write leak-free code in C++ to begin with, you won’t have these issues when translating.

But indeed, as @moneta said, with ROOT 6.26 you should prefer RooDataSet.from_numpy for this. In your example that would be:

obsvars = [ROOT.RooRealVar(obs_names[i],obs_names[i],0.) for i in range(n_vars)]
ds = ROOT.RooDataSet.from_numpy({"radius" : data[:,0],
                                 "product" : data[:,1]}, obsvars)

To learn more about this function, you can print its docstring like this for example:

print(ROOT.RooDataSet.from_numpy.__doc__)

Hope this helps!

Cheers,
Jonas

Thanks everyone, very helpful. @jonas Following on your comment here, could i ask for an example of the ‘correct’ C++ implementation for this? I’d like to understand how this should be done, as it’s something I’ll need to do frequently in the future. Thanks!

Hi @haselsco!

a leak-free, modern C++ implementation of the first part of your code would be this:

    // set up the vars, set, and dataset
    std::vector<std::string> ds_vars{"radius","product"};
    int n_vars = ds_vars.size();

    RooArgSet vars{"obs_vars"};
    for(int i=0;i<n_vars;i++){
      auto var = new RooRealVar(ds_vars[i].c_str(),ds_vars[i].c_str(),0);
      vars.addOwned(*var);
    }
    RooDataSet ds{"obs_data","obs_data",vars};

There are 3 things I changed:

  • use a standard vector of strings instead of using C-style arrays with TStrings (has nothing to do with leaks but as you see with the sizeof the C-style arrays are awkward to handle)
  • create vars and ds on the stack and not manually on the heap to fix two memory leaks
  • use vars.addOwned instead of vars.add such that the vars collection will delete the RooRealVars so they don’t leak

In general, you need to think about who is calling delete on an object every time you call new. Best would be to avoid new and delete alltogether :slight_smile:

Hope this helps!

Cheers,
Jonas

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.