Save a RooDataSet: Error in <TBufferFile::WriteByteCount>: bytecount too large (more than 1073741822)

I want to save a RooDataSet within a RooWorkspace, but I get the error:

Error in <TBufferFile::WriteByteCount>: bytecount too large (more than 1073741822)

When I try to read the dataset (myWS->data("dataWithSWeights")), I get

Error in <TBufferFile::CheckByteCount>: object of class TObjArray read too many bytes: 1459806972 instead of 386065148
TBufferFile::CheckByteCount:0: RuntimeWarning: TObjArray::Streamer() not in sync with data on file <path/to/file> fix Streamer()
Error in <TBufferFile::CheckByteCount>: Byte count probably corrupted around buffer position 420057209:
	-2069560387 for a possible maximum of 1039758320
TBufferFile::CheckObject:0: RuntimeWarning: reference to object of unavailable class TObject, offset=420057211 pointer will be 0
Error in <TExMap::Remove>: key 122962529 not found at 196982
TBufferFile::CheckObject:0: RuntimeWarning: reference to object of unavailable class TList, offset=122962529 pointer will be 0

etc.

Here is the code I use to create and save the dataset:

sData = ROOT.RooStats.SPlot('sData', 'An SPlot', data, model, yields)
fout = ROOT.TFile.Open(wsSdatafilename, 'recreate')
wsout = ROOT.RooWorkspace(uts.wsname)
data_s = ROOT.RooDataSet('dataWithSWeights', 's-weighted data', data.get(), RooFit.Import(data), RooFit.WeightVar(nsig.GetName() + '_sw'))
data_s.convertToTreeStore()
getattr(wsout, 'import')(data_s)
fout.cd()
wsout.Write()
fout.Close()

The data size for any single entity, which you try to save to / retrieve from a TFile buffer / basket, is limited to 1GB minus 2 bytes (i.e. 1073741822 bytes).

So the recommended solution is to split up the dataset and merge them in memory later?

I am not sure that the dataset (I mean the tree) is the culprit. The tree should split the data into baskets, which should be small enough.
What’s inside the dataset? Just a couple of data columns? How many events?

for the record, this is the workaround I came up with:

def makeRooList(itm):
    'calls itm.createIterator() and loops it into a list'
    retlist = []
    itritm = itm.createIterator()
    v = itritm.Next()
    while v:
        retlist.append(v)
        v = itritm.Next()
    return retlist


fout = ROOT.TFile.Open(wsSdatafilename, 'recreate')
data_s = ROOT.RooDataSet('dataWithSWeights', 's-weighted data', data.get(), RooFit.Import(data), RooFit.WeightVar(nsig.GetName() + '_sw'))
ndatasets = 10
print 'will save as', ndatasets, 'separate workspaces and datasets'
chunk_size = data_s.numEntries() // ndatasets
assert chunk_size

# persistify
datasetlist = []
wslist = []
for ids in xrange(ndatasets):
    # declare ws and dataset
    wslist.append(ROOT.RooWorkspace('myWS' + str(ids)))
    datasetlist.append(data_s.emptyClone(data_s.GetName() + str(ids)))
    
    # decide what range of entries to look at
    startrange = ids * chunk_size
    endrange = (startrange + chunk_size) if (ids + 1 < ndatasets) else data_s.numEntries()
    obs = datasetlist[-1].get()
    
    # loop entries in range
    for i in xrange(startrange, endrange):
        for o, oo in zip(makeRooList(data_s.get(i)), makeRooList(obs)):
            oo.setVal(o.getVal())
        datasetlist[-1].add(obs, data_s.weight(), data_s.weightError())
    
    # save
    getattr(wslist[-1], 'import')(datasetlist[-1])
    fout.cd()
    wslist[-1].Write()
fout.Close()

I then extract each dataset from each workspace in turn and use RooDataSet.append to combine them.

The total dataset has 21,734,103 entries, 7 columns (from RooDataSet.get), and per-entry weights.

This is exactly what I needed, specifically the weights. Thanks for the info. Unfortunately, RooFit stores the weights separately from the data columns. For this reason, it can’t write out the weights when they are more than 1 Gb.

I opened a JIRA ticket to address this restriction, but it’s going to take a while until this can be treated properly.
https://sft.its.cern.ch/jira/browse/ROOT-10188

It’s unfortunate that you will have to use the workaround in the mean time.

Just for future reference:
ROOT 6.18.02 and 6.20.00 won’t have this limitation, at least for the tree data storage. It can be tested with the master version of ROOT or with the nightlies:
https://root.cern/nightlies

The ticket sft.its.cern.ch/jira/browse/ROOT-10188 has been closed.

1 Like