Hi, I have two large TChain’s for which I need to build a TTreeIndex and make one friend of the other. I need to run the entire script several times, and it’s apparent that the index building is a major bottleneck. Therefore I’m now saving the indices to file on a first pass and simply reading them on further ones, using the following logic:
string index_file_name = "my_file_with_indices.root";
ifstream _index_file(index_file_name.c_str());
TFile* index_file;
TTreeIndex *first_index, *second_index;
if (_index_file.good()) {
index_file = new TFile(index_file_name.c_str(),"READ");
first_index = (TTreeIndex*)index_file->Get("first_index");
second_index = (TTreeIndex*)index_file->Get("second_index");
}
else {
index_file = new TFile(index_file_name.c_str(),"RECREATE");
first_index = new TTreeIndex(first_chain, "major", "minor");
second_index = new TTreeIndex(second_chain, "major", "minor");
first_index->Write("first_index");
second_index->Write("second_index");
}
before setting my TChain indices with first_index and second_index and befriending the TTrees. The second index corresponds to a particularly large number of entries and so upon writing it I get
Error in <TBufferFile::WriteByteCount>: bytecount too large (more than 1073741822)
Writing a single object is indeed capped at 1Gb (some internal pointer/reference are 32bits in the binary representation). Trying to allocate a buffer larger than 1Gb has ‘undefined’ behavior :).
Do your files have non-overlapping range of [first_index, second_index] ?
Hi @pcanal, what do you mean by “non-overlapping range” sorry? ([first_index, second_index] provides a unique identifier across all files of the TChain, if that’s your question)
In the second (b) case, you can generate an TTreeIndex per file and it will work.
In the first (a) case, it will not as it would need to scan/open each file to find the correct entry.
Ah, I see what you mean My setup is your (b), so yes I could generate one TTreeIndex per file.
But in fact I can reproduce the problem if I consider even a single file with two trees. Specifically, one has 8546147 entries and I can save the corresponding TTreeIndex; whereas the second one has 79825000 entries and raises the <TBufferFile::WriteByteCount>: bytecount too large error above.
Is there a way I could split the writing of the TTreeIndex for a single tree in a single file? Alternatively, how can I write the corresponding information to, say, a csv file, and read it back to create my own TTreeIndex?
You can save the result of mytree->GetTreeIndex(); as an individual key in the file (and then call mytree->SetTreeIndex(nullptr); before writing the tree. But then at read time you need to do the converse (read individually and then re-attach it).
What we really need is a change in TTree::Streamer that makes this process automatic (and split it over several keys if needed).
You can save the result of mytree->GetTreeIndex(); as an individual key in the file (and then call mytree->SetTreeIndex(nullptr); before writing the tree.
I’m probably misunderstanding something here, because that doesn’t seem to help…
What we really need is a change in TTree::Streamer that makes this process automatic