RooDataSet memory usage

Hello forum,

I found a puzzling RooFit fact and am wondering if it is intentional or a bug. It concerns RooDataSet memory usage. When I read a 36MB ASCII file into a RooDataSet class, the memory usage shots up to cca 600MB. It seems the relationship between the file size and memory usage is pretty linear - always around 17 times more memory is used. I find this weird as saving a bunch of numbers as an ASCII text is pretty inefficient, and I would expect that loading them in a RooDataSet would only shrink the size.

I ran the same program through valgrind and this is the output:

==4872== Memcheck, a memory error detector
==4872== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==4872== Using Valgrind-3.6.1-Debian and LibVEX; rerun with -h for copyright info
==4872== Command: bin/Debug/DSRhoFit data/dataset_1 results/output 0.275 0.57 0.936 2.84 1.79371 0.01 0.01 0.01 0 0 0 1 1
==4872==

RooFit v3.50 -- Developed by Wouter Verkerke and David Kirkby
                Copyright (C) 2000-2011 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

READING DATA
[#1] INFO:DataHandling -- RooDataSet::read: reading file data/dataset_1
==4872== Warning: set address range perms: large range [0x555cb040, 0x717cb049) (defined)
==4872== Warning: set address range perms: large range [0x717cc040, 0x8d9cc049) (defined)
[#0] ERROR:DataHandling -- RooDataSet::read(static): read error at line 1000001
[#1] INFO:DataHandling -- RooDataSet::read: read 1000000 events (ignored 0 out of range events)
DATA READING COMPLETE
RooDataSet::dataset[tht,thb,phit,dt,decType,blindState] = 1000000 entries
Real time 0:13:00, CP time 780.750
==4872==
==4872== HEAP SUMMARY:
==4872==     in use at exit: 999,744,677 bytes in 28,653 blocks
==4872==   total heap usage: 35,065,949 allocs, 35,037,296 frees, 3,739,657,178 bytes allocated
==4872==
==4872== LEAK SUMMARY:
==4872==    definitely lost: 2,506 bytes in 15 blocks
==4872==    indirectly lost: 26,284,935 bytes in 257 blocks
==4872==      possibly lost: 970,563,371 bytes in 10,558 blocks
==4872==    still reachable: 2,893,865 bytes in 17,823 blocks
==4872==         suppressed: 0 bytes in 0 blocks
==4872== Rerun with --leak-check=full to see details of leaked memory
==4872==
==4872== For counts of detected and suppressed errors, rerun with: -v
==4872== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)

Please take note of the “possibly lost” item. Valgrind’s manual states: “‘possibly lost’ means your program is leaking memory, unless you’re doing unusual things with pointers that could cause them to point into the middle of an allocated block.”

Is it possible that this is not a memory leak?

Here is a minimal-ish working example:

#include "RooRealVar.h"
#include "RooCategory.h"
#include "RooDataSet.h"
#include "RooArgSet.h"

const double PI = 3.141592;

int main(int argc, char* argv[])
{

    RooRealVar thb("thb","thb",0,PI);
    RooRealVar tht("tht","tht",0,PI);
    RooRealVar phit("phit","phit",-PI,PI);
    RooRealVar dt("dt","dt",-10,10);
    RooCategory decType("decType","decType");
    decType.defineType("a",1);
    decType.defineType("ab",2);
    decType.defineType("b",3);
    decType.defineType("bb",4);

    RooDataSet* dataSet = new RooDataSet("data","data",RooArgSet(tht,thb,phit,dt,decType));
    dataSet = RooDataSet::read("dataset",RooArgList(tht,thb,phit,dt,decType));
    return 0;
}

To get a similiar dataset just fill a file named “dataset” with 1M times this line

1.30975 1.99973 -2.69389 -0.634206 2

Moreover, during the fitting procedure the memory usage doubles. That means I need cca 40 times more memory than I have data.

I tested this with RooFit 3.50 and 3.52 (ROOT 5.32/00 resp. 5.32/01) on three different machines with the same results.

I would very much appreciate any help or thoughs on the problem.

Best regards,
Daniel