I have been using RDataframe for some time and was wondering if there is an efficient method to sum over all events for vector objects.
For example, one could have a vector of floats in a TTree x = <a, b, c>. I am wondering if it is possible to sum over all the 0th index values (i.e. “a”) across the events in a TTree? I do know that it is possible to flatten the vector and then sum over the individual float.
Thanks for the post and welcome to the ROOT community!
Yes, there is, and it’s quite easy. If I understand correctly your question and make my guesswork right, it should be
auto first_element_sum = mydf.Define("first_element", "myVectorBranch[0]").Sum("first_element");
std::cout << *first_element_sum << std::endl;
You can be more efficient by specifying the type of the data stored in the vector. For example, supposing that the type stored in the vector is a float:
auto first_element_sum = mydf.Define("first_element", "myVectorBranch[0]").Sum<float>("first_element");
std::cout << *first_element_sum << std::endl;
Your code snippet definitely works, since it flattens the array. However, I was wondering if there was any feature/function in RDF that could exactly emulate the effect “np.sum” of this snippet:
import numpy as np
M = np.array([[1,2,3], [4,5,6]])
np.sum(M, axis = 0)
In this situation, one gets an output of sums as a vector. I am not sure how hard this would be to implement but such a feature seems quite natural to have in RDF
I think I do not understand well the question.
Do you have a branch with a vector<int> and you want to obtain per event the sum of the elements in such vector?
auto df = ROOT::RDataFrame(8);
auto dfVectorColumn = df.Define("v", [](){return std::vector<double>({1.,2.2,.32.2});});
auto sum = dfVectorColumn.Sum("v");
std::cout << *sum << std::endl;