Copy large TTree to Memory

tafoya · September 29, 2021, 3:56pm

Hi all,

Sorry if the topic is duplicated, but I could not find the solution on my own. I am trying to copy a single large TTree object from a .root file into memory, using something like

TFile *file = TFile::Open("filename.root","READ");
TTree *tree_file = (TTree*)file->Get("treeName");

gROOT->cd();

TTree* tree_memory = tree_file->CopyTree("");

delete tree_file;
file->Close();
delete file;

tree_memory->Print();   // this just to check it still exists

which works just fine for small ones. The problem is that some of my trees are larger than 1GB, so the command CopyTree("") gives

Error in <TBufferFile::WriteByteCount>: bytecount too large (more than 1073741822)

Is there any way to increase that limit or a best-practice workaround I should use instead?

Thanks in advance!
Juan

Wile_E_Coyote · September 29, 2021, 4:12pm

@pcanal I guess you need a: TMemFile

ROOT Forum → Search → TMemFile

pcanal · September 29, 2021, 4:34pm

In order to find the “right” solution to your situation, could you clarify “why” you want or need to load the whole TTree in memory? I.e. what is the end goal?

tafoya · September 29, 2021, 4:53pm

Thanks for the quick replies.

In a nutshell, I am working on a calibration technique that is applied in an event-by-event basis, according to some parameters p_0,p_1,p_2,… The best parameters p_i are found with MINUIT, which means that at each call/iteration of the minimization I must re-loop over all the events of the TTree.

Right now, at each iteration I have to open the corresponding .root file, loop over all the events, and close the fie. This (reading from disk) is the most time-consuming part of my study, and in fact the minimization can take several days. My hope is that by keeping the TTree in memory the process can be sped up.

I hope this is clear enough.

Juan

pcanal · September 29, 2021, 5:28pm

It sounds that you would only be using a few of the branch during this minimization and thus do not really need the whole TTree in memory.

Also why do you need to “at each iteration I have to open the corresponding .root file” instead of re-using the same TFile object (i.e. closing it only at the end) and the same TTree object?

I see 3 possible implementations.

Inside of copying the whole TTree, load the data of the few branches into std::vector objects and use those. For example:

    double p_0, p_1, ....;
    TBranch *b_p_0 = nullptr;
    tree->SetBranchAddress("p_0", &p_0, &b_p_0);
    ...
    for(Long64_t e = 0; e < tree->GetEntriesFast); ++e) {
         auto bentry = tree->LoadEntry(e);
         b_p_0->GetEntry(e);
         vector_p_0->push_back(p_0);
   }

or probably better yet, use RDataFrame:

ROOT::RDataFrame d(treeName, fileName, {"p_0", "p_1"});
auto p0vec = d.Take<double>("p_0");
auto p1vec = d.Take<double>("p_1");

(the most memory efficient), make a partial copy of the TTree into a (compressed) TMemFile:

TMemFile paramsfile(filename,"RECREATE");
tree_file->SetBranchStatus("*", false);
tree_file->SetBranchStatus("p_0", true);
tree_memory = tree_file->CloneTree(-1, "fast");

and use the TMemFile as you have been doing for the file

Load in memory the baskets. This requires the keep around the TFile and TTree and at the start
do

auto b = tree_file->GetBranch("p_0");
b->LoadBaskets();

This will load from the disk only once, and will decompress the basket only once.

system · October 13, 2021, 5:29pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.