Best way to feed large amounts of data to RooFit?

Hi everyone,

I have been using RooFit for a while now with some great results. :smiley:
Now I’m trying to scale up to using “real size” datasets which are large: about 10^12 points (until now I have been working with sets around 10^7 in size or so.)

I am wondering: what is the best way to get this data to RooFit as quickly as possible? I am using compiled C++, and have this data available as just arrays of doubles. Suppose I have only 1D data.
I could get the right answer with a tight copy loop:

for(int i=0; i<100000; i++) {
  rv = mydata[i];
  histo.add(RooArgSet(rv));
}

for rv a RooRealVar, and histo a RooDataHist. This is what my program does now, and it works. However, it’s a bit slow for larger amounts of data. Is there some way I can tell RooFit “here is a pointer to an array of this many doubles, please bin this data as quickly as you can”? Or should I be binning/preprocessing this data myself, and only giving RooFit reduced data?

Hi,

That add() interface is not designed for maximal efficiency.

You can construct a RooDataSet or RooDataHist efficiently from a ROOT TTree.

I note that both TTrees and RooDataSets or -Hists can be stored in ROOT files so it may be efficient to do the conversion one-time and then use a dataset stored in a file from thereon.

Wouter