Hi,
I’m trying to write a TTree by reading the data from another TTree. We have our own algorithms to decide which branch and which entries should be written to the new output tree.
As is suggested, we are also using the TTreeReader for the data reading:
auto input_file = TFile{ "input.root", "read"};
auto output_file = TFile {"output.root", "recreate"};
auto* tree_data = input_file.Get<TTree>{"TreeName"};
auto tree_reader = TTreeReader{};
tree_reader.SetTree(tree_data);
auto output_tree = TTree{"OutputTreeName"};
auto my_branch = TTreeReaderValue<MyBranch>{ tree_reader, "MyBranchName" };
auto check_branch = TTreeReaderValue<double>{ tree_reader, "CheckBranchName" };
tree_reader.Next();
// Must be after calling Next(). Otherwise it doesn't work.
output_tree.Branch(my_branch.GetBranchName(), my_branch.Get());
while(tree_reader.Next())
{
if(not algorithm_check(check_branch)) continue;
*my_branch; // Even though my_branch value is not needed, it must be dereferenced.
output_tree.Fill();
}
output_file.WriteObject<TTree>(&output_tree, outptu_tree.GetName());
Here are some caveats in the code above:
my_branch
must be dereferenced every time before calling theFill
of the output tree. Otherwise, the same value will be filled every time. What’s even worse is the compiler could optimize this line away if it sees the value is dereferenced but never used.- It’s very slow. The branch with “MyBranchName” has to be deserialized, copied and serialized again while its deserialized values are never used.
- Registering the branch to the output tree must happen after called
Next()
of the tree reader. Otherwise, the address obtained from the reader is justnullptr
. I know sometimes the lazy operation could make the code faster but it shouldn’t make the code illogical. - No multithreading.
Other alternatives seem even worse:
- Using native APIs of
TTree
has to deal withT**
for any user defined class. RDataFrame
seems to have very nice APIs. But, in practice, it’s super hard to be integrated into a large event-driven C++ code base.
Reading the data from input root files and output some entries to another root file could be very common. I would really appreciate it if ROOT devs could suggest a better way in terms of performance and simplicity.
Thanks for your attention.
ROOT Version: 6.28
Platform: Debian buster
Compiler: gcc 13