Hi,
I used the attached macro to Redefine() a series of columns on my RDataFrame. However, using snapshot to save to file saves the old columns as well as the redefined ones, leading to duplicated data. Can this be avoided?
When saving the redefined columns Snapshot changes the “.” characters in the column names to underscores. I suppose this has to do with the validity of strings containing dots as C++ variable names. Is there a way to get around this to keep my original column names? The dots come from the original TTree and I would like to maintain the format if possible.
_ROOT Version: 6.28/00 Platform: Not Provided Compiler: Not Provided
ETA: I think both problems are actually closely related, as the columns that were not renamed by Snapshot (because they didn’t contain “.”) do not show up twice.
I don’t think there is a way to Redefine a branch with a . in its name and save it to a new file with Snapshot keeping the name with the . – it’s not supported.
About Snapshot saving both the original branch with the . in the name and the new one changing the . with _, that’s surprising and sounds like a bug. Feel free to open an issue, or maybe @vpadulan or @mczurylo could take a look.
Hi Enrico,
Thank you for the reply. Unfortunately I figured this would be the case.
The renaming and column duplicates are easy enough to work around. What has been giving me trouble is that the Redefined columns save as RVec, and not as the original type of the branches (Int_t, Double_t), etc. The leaves also don’t get saved within the branches they previously belonged to. I know something “breaks” during the renaming. Saving an unmodified tree doesn’t give any issues with the tree structure.
My main issue is that I need to use the merged tree I generate as the input on another macro, and with this troubled structure calling a tree entry doesn’t seem to properly give me the data I need (it seems calls the entire RVec). This is what I get from TTree->Show() (for the tree produced) using snapshot.
Assuming there is no way around this from the snapshot I’ve been using Rvec[entry][item]. It sems to work, is this the correct way to call the value I need?
Ah, yes we should add a note in the documentation about what Snapshot does with C-style arrays: when you have branches in the input tree containing C-style arrays (int*, Double_t* like I understand is your case) then Snapshot is able to write them out again as C-style arrays if the values come from the original branches.
Otherwise, RDataFrame has to go through the intermediate RVec representation (as it is the case with your Redefine-d columns): RDataFrame operations (other than Snapshot in that particular case described above) do not handle C-style arrays, they are converted to RVecs. And Snapshot will then write these columns out as RVecs, because that’s their type now.
I don’t know, I’m missing a lot of context. Feel free to provide a sample input file and a self-contained, stripped down reproducer that I can take a look at.
Thank you, I have already figured out how adapt the other macro to call the RVecs. Working through some details that I still need to fix but I think at this point they’re unrelated to the TTree type/structure.
Just to confirm, I can’t do an operation of this form:
auto append_func_call_int=[](ROOT::VecOps::RVec<int> inputArray1,ROOT::VecOps::RVec<int> inputArray2){
const auto size = inputArray2.size();
for (size_t i = 0; i < size; i++)
inputArray1.emplace_back(inputArray2[i]);
return inputArray1;};
on tree branches without relying on RDataFrame, right? If so bypassing the details that come up through Snapshotting is basically the only way to rewrite my tree the way I need to, so I can just stick with that even if it’s not “pretty”.
Best regards
As I understand the snippet, you want to take a TTree with branches A and B which contain collections of integers, and produce a new tree in which A, for every event, is the concatenation of the original A and B.
You can do that by reading and writing the TTrees directly, with tree->SetBranchAddress and tree->Branch calls, but it’s more convoluted (it’s what RDF does under the hood).
P.S.
note that you can just write the snippet as return Concatenate(inputArray1, inputArray2). RVecs have a lot of useful helper functions.