Huge memory consumption of RooDataSet

Dear RooFit developers,

I use ROOT version 5.32/00, comming with CMSSW_5_2_3_patch1.

In my analysis, implemented using pyroot, I use a RooDataSet with a size of ~500 MB when stored on disk.
It contains about 15 variables.
When I load the dataset it uses about 5.3 GB of memory!!!
After adding about 15 more categories (using RooThresholdCategories and RooDataSet::addColumn(thrsCat)) this dataset occupies nearly all the available memory of 16 GB!
Unsurprisingly I then run into allocation problems.

My main question is whether this huge memory consumption is the consequence of a bug or considered necessary. If not, is there anything I can do to avoid these annoying problem?

A second question is whether there is a way to store RooDataSets larger than 1 GB on disk.
Currently I get messages like

for datasets exceeding 1 GB.

Regards,
Wolfgang

Is there really nobody who has an advice or experience with this problem?
It actually poses a serious problem for my analysis.
I would appreciate it if somebody (on the developer side) would at least give a statement wether this is a known problem. Talking to other RooFit users (in person) it seems that I’m not the only one struggling with this.
Best,
Wolfgang

Hi,

This is a known problem, there is a lots of overhead in the memory when using multi-dimensional data sets, especially with the categories.
I could try to help you, but I would need to know exactly how your data set is done

Best Regards

Lorenzo

Sorry to enter this thread especially after posting one of my own, but the more I think about it the more I feel lack of a buffer cache to RooDataSet files is a cruicial set back in RooFit. The CPU power and the integrated experimental statistics exist to make precision unbinned ML fits attractive in HEP and elsewhere, so I can’t see why RooFit shouldn’t be the perfect tool for this job. Encountering RAM limits to data processing was like hitting a wall in my analysis, and the last thing I expected from a ROOT I/O inherited framework.

In researching the problem I can appreciate the complexities which necessitated the branching of RooFit methods from standard ROOT in the first place, but throwing away possibilities such as TTreeCache implementation seems like a large waste, in my shortly gathered opinion.

That said, I would be extremely interested to hear of an external ‘hook’ or any way to stream file data into a RooDataAbs inherited object!