TTree Error in <TBufferFile::CheckCount> or Error in <TBufferFile::WriteByteCount>:

Hello,

I have a question about TTree file IO. I get the following error when Writing my TTree to a TFile at a certain threshold point of my task:
Error in TBufferFile::CheckCount: buffer offset too large (larger than 1073741822)
Error in TBufferFile::CheckCount: buffer offset too large (larger than 1073741822)
Error in TBufferFile::WriteByteCount: bytecount too large (more than 1073741822)
Error in TBufferFile::WriteByteCount: bytecount too large (more than 1073741822)
Error in TBufferFile::WriteByteCount: bytecount too large (more than 1073741822)
Error in TBufferFile::WriteByteCount: bytecount too large (more than 1073741822)

I was reading other posts a bit and tried to fix the issue to my understanding, but so far without any progress…
So as far as I understand the data for a branch is buffered and then streamed into the TFile and if the total data size for the branch is larger than 1073741822 bytes, you get that error. Is that correct?

Now to my task. In principle I want to save this map into a ROOT file

  std::map<Point2D, Point2DCloud> hit_map_2d;

with

struct Point2D {
  double x;
  double y;
};

struct Point2DCloud {
  std::map<Point2D, unsigned int> points;
  unsigned long total_count;
};

This map carries all of the detector resolution information I need for my smearing algorithm. Now initially I only had two branches, one for the key (so a Point2D container branch) and one for the value (so a Point2DCloud container branch). Usually I hear that splitting the data should solve this problem and I understood this as splitting the Point2DCloud into its parts, hence I created a branch for the x and y coordinate and the counters, as shown in the following code snippet.

  data_tree = new TTree();

  // new format
  Point2D mc_point;
  const Point2D *pmc_point(&mc_point);
  std::vector<double> reco_points_x;
  const std::vector<double> *preco_points_x(&reco_points_x);
  std::vector<double> reco_points_y;
  const std::vector<double> *preco_points_y(&reco_points_y);
  std::vector<unsigned int> reco_points_count;
  const std::vector<unsigned int> *preco_points_count(&reco_points_count);
  ULong64_t total_count;

  data_tree->Branch("mc_point", &pmc_point);
  data_tree->Branch("reco_points_x", &preco_points_x);
  data_tree->Branch("reco_points_y", &preco_points_y);
  data_tree->Branch("reco_points_count", &preco_points_count);
  data_tree->Branch("total_count", &total_count, "total_count/l");

  for (auto const& entry : hit_map_2d) {
    pmc_point = &entry.first;
    total_count = entry.second.total_count;
    reco_points_x.clear();
    reco_points_y.clear();
    reco_points_count.clear();
    for (auto const& reco_point : entry.second.points) {
      reco_points_x.push_back(reco_point.first.x);
      reco_points_y.push_back(reco_point.first.y);
      reco_points_count.push_back(reco_point.second);
    }
    data_tree->Fill();
  }

But the problem persists. So I did a little analysis of how much memory my object should consume:

[code] std::cout<<“estimating size of data in memory…\n”;
unsigned long bytes_unsigned_ints(0);
unsigned long bytes_doubles(0);
unsigned long total_overhead_bytes_unsigned_ints(0);
unsigned long total_overhead_bytes_doubles(0);
unsigned short overhead_unsigned_ints(sizeof(std::vector));
unsigned short overhead_doubles(sizeof(std::vector));
unsigned long all_reco_points(0);

for (auto const& entry : hit_map_2d) {
bytes_unsigned_ints += overhead_unsigned_ints;
total_overhead_bytes_unsigned_ints += overhead_unsigned_ints;
bytes_doubles += overhead_doubles;
total_overhead_bytes_doubles += overhead_doubles;
bytes_unsigned_ints += sizeof(unsigned int)*entry.second.points.size();
bytes_doubles += sizeof(double)*entry.second.points.size();
all_reco_points += entry.second.points.size();
}
std::cout<<“memory summary:\n”;
std::cout<<“overhead for a single unsigned int vector: “<<overhead_unsigned_ints<<” bytes\n”;
std::cout<<“overhead for a single double vector: “<<overhead_doubles<<” bytes\n”;
std::cout<<"number of entries: "<<hit_map_2d.size()<<std::endl;
std::cout<<"number of reco entries: "<<all_reco_points<<std::endl;
std::cout<<“total memory consumption for all unsigned int vectors: “<<bytes_unsigned_ints<<” bytes\n”;
std::cout<<“total memory consumption for all double vectors: “<<bytes_doubles<<” bytes\n”;
std::cout<<“total overhead for the unsigned int vectors: “<<total_overhead_bytes_unsigned_ints<<” bytes\n”;
std::cout<<“total overhead for the double vectors: “<<total_overhead_bytes_doubles<<” bytes\n\n”;
std::cout << “converting hit map to root tree…\n”;
[/code]

So the largest branches should be the reco_points_x/y ones, with a size of roughly 440mb. How much overhead is in the buffer for storing one tree entry? Or am I missing something else? Thank in advance

Best regards,
Stefan

Hi,

Until an expert gives a more precise answer, maybe this could give some hints?
[url=https://root-forum.cern.ch/t/saving-a-single-vector-of-data-exceeding-1-gb/21262/1 a single vector of data exceeding 1 GB[/url]

Cheers, Bertrand.

Hi Bertrand,

thx for your reply.

[quote=“bellenot”]
Until an expert gives a more precise answer, maybe this could give some hints?
[url=https://root-forum.cern.ch/t/saving-a-single-vector-of-data-exceeding-1-gb/21262/1 a single vector of data exceeding 1 GB[/url][/quote]
Sounds interesting, but if I understand correctly that should be identical to what I have done right now (writing every variable into its own branch). Furthermore I think NTuple wont work as it only allows for variables of the same type. I have the feeling the branches containing the vector or vector are making the problem. I will try to split those a bit more and see what happens… It sure would be nice to understand why its not possible to write out an arbitrarily long amount of data. To my understanding it should work like this: fill buffer -> if its full -> flush -> clear buffer -> fill more data into buffer -> flush … and rinse and repeat until all data is written.

As a last resort I will just chop my data into few seperate root files…

Cheers,

Stefan

[quote] if the total data size for the branch is larger than 1073741822 bytes, you get that error.[/quote]More specifically, the data size for one entry in a single branch is limited to 1Gb.

[quote]As a last resort I will just chop my data into few seperate root files…[/quote]If you are able to chop the data into more files, you should also be able to chop it into more branches and/or more entries and still solve the problems.

[quote]It sure would be nice to understand why its not possible to write out an arbitrarily long amount of data. [/quote]It is not possible to store more than 1Gb in a single buffer/basket because the internal representation of offset (within the binary platform independent representation) uses 32bits integers.

data_tree = new TTree();You must specify a name and title for a TTree to be fully consistent.

[quote]But the problem persists. So I did a little analysis of how much memory my object should consume:[/quote]Indeed according to your numbers, the problem should have gone.

To narrow the problem, I would
a) keep the code the (almost) same but run experiment where you create only one of the branches, this will tell us which of the branches is failing
b) include just before the Fill, the printing of the size of the content of the collection still stored, this should confirm (or not) whether the data is suppose to fit or not.

Cheers,
Philippe.

PS. As a side note, when Filling, using an intermediary pointer, eg:std::vector<unsigned int> reco_points_count; const std::vector<unsigned int> *preco_points_count(&reco_points_count); data_tree->Branch("reco_points_count", &preco_points_count);is no longer necessary and the following should work:std::vector<unsigned int> reco_points_count; data_tree->Branch("reco_points_count", &reco_points_count);

Hi Philippe

Thanks. Ok I got it running now, I messed up a bit… I stored that TTree as a member of my custom Object (that is saved to a TFile). So I guess the problem that I wrapped my custom object around the TTree and then had again one single large object that is saved to the TFile. I split the TTree from my custom object now, but it still did not work right away. I had to split the TTree into parts (so just simply chop the entries), in order for it to work. I got one more question about streaming to verify I understand completely. The buffer size for writing to a TFile is limited to 1GB, so whenever writing some root object or custom object to a file it should not exceed 1GB. This makes sense to me, but it seems to me that the TTree is buffered and written as a whole. Is that true? Or is it important for me to give the TTree some name and title? As far as I understand thats only for root to manage objects (garbage collect or so).

Cheers,
Stefan

Hi Stefan,

[quote]Or is it important for me to give the TTree some name and title?[/quote]It is important. For historical reason the constructor with or without name/title are very different. Without a name/title, the TTree is not fully functional. In order to correctly use a TTree, you must construct it with the constructor taking a name and title.

[quote]but it seems to me that the TTree is buffered and written as a whole. [/quote]This is a plausible consequence of using the default constructor to create the TTree.

Cheers,
Philippe.

Hi,

ok it works now also with one big TTree when I use the other constructor. Thanks a lot Philippe

Cheers,
Stefan