Test Train split for root file

To whom it may concern, now we have a root file of a tree with 10,000 entries. I want to split the file into a test file(3000 entries) and train file(7000 entries) randomly.

Could you please give us some codes to accomplish this? Thanks.

You can use TTree::CloneTree(0) to create empty test and train trees with the same structure than your input tree. You can then iterate over the events of your input tree and, in every iteration, evaluate gRandom->Rndm(). If the return value is smaller than 0.3, you fill the test tree with the current entry, otherwise you fill the train tree.

Thanks so much!

Thinking again about it, it is actually easier with TEntryList and TTree::CopyTree, I think. You’d create the two randomized entry lists with the event numbers for the test and the train tree, and then copy the input tree to two new trees with the selection done by the entry lists, like this:

auto *trainList = new TEntryList();
// 7000 random entry numbers added with trainList->Enter(int);

auto *inputFile = TFile::Open("input_file.root", "READ");
auto *inputTree = inputFile->Get<TTree>("InputTree");
inputTree->SetEntryList(trainList);

auto trainFile = TFile::Open("train.root", "RECREATE");
// Because we set the list, only entry numbers in trainList are copied
auto trainTree = t->CopyTree("");
trainFile->Write();
trainFile->Close();

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.