Saving a single vector of data exceeding 1 GB

Hello,

I’m processing a large ROOT tree to get my data. At the end, I have a single vector of data bigger than 1 GB. Trying to save this data is made hard because of a limit by ROOT not allowing to save bigger datasets in a single object (with good reason, I assume). I searched through this forum and found a [url=https://root-forum.cern.ch/t/tbufferfile-checkcount-warning/9259/6 by pcanal which recommends saving the data with a TTree as it can split data by itself into smaller chunks, but I still get the error

with this piece of code:

[code]{
vector data(150000000);
TVectorD obj_data = TVectorD(data.size(), &data[0]);

TFile *outfile = new TFile(“fname.root”, “recreate”);
TTree *outtree = new TTree(“outtree”, “outtree”);
outtree->Branch(“data”, &obj_data);
outtree->Fill();
}[/code]

I created the TVectorD because trying to save the vector itself didn’t work either and thus I tried saving something inheriting from TObject.

I could, however, still write outtree and close the file; the resulting root file is also big enough on disk to have stored a lot of data. Trying to read the contents fails, though, with

Error in <TBufferFile::CheckByteCount>: object of class TVectorT<double> read too many bytes: 1134923509 instead of 61181685 Warning in <TBufferFile::CheckByteCount>: TVectorT<double>::Streamer() not in sync with data on file fname.root, fix Streamer()

This is unsurprising since TBufferFile already gave me an error beforehand, but I thought it interesting nonetheless.

What am I doing wrong? If possible, I’d like ROOT to manage splitting up my data into chunks it can work with by itself rather than implementing something to split my data myself. I am working with ROOT 5.34, if that matters.

Thank you very much for your time and help!

Hi,

you use the expression “Dataset in a single object”: is this huge vector a dataset for you?
Perhaps the TNtuple root.cern.ch/doc/master/classTNtuple.html class is a better choice for this?

Cheers,
Danilo

I am not entirely sure if I understood you correctly, but I’m now filling my data directly into a branch of a TNtuple instead of a vector first. This works fine, but seems slower (about a factor 2); but I’m guessing that’s fine after calling TNtuple::Fill more than a hundred million times.

Thank you for your help!