Summing over vectors with RDF

Hi,

I have been using RDataframe for some time and was wondering if there is an efficient method to sum over all events for vector objects.

For example, one could have a vector of floats in a TTree x = <a, b, c>. I am wondering if it is possible to sum over all the 0th index values (i.e. “a”) across the events in a TTree? I do know that it is possible to flatten the vector and then sum over the individual float.

Best Regards,
Sahibjeet

Dear Sahibjeet,

Thanks for the post and welcome to the ROOT community!

Yes, there is, and it’s quite easy. If I understand correctly your question and make my guesswork right, it should be

auto first_element_sum = mydf.Define("first_element", "myVectorBranch[0]").Sum("first_element");
std::cout << *first_element_sum << std::endl;

You can be more efficient by specifying the type of the data stored in the vector. For example, supposing that the type stored in the vector is a float:

auto first_element_sum = mydf.Define("first_element", "myVectorBranch[0]").Sum<float>("first_element");
std::cout << *first_element_sum << std::endl;

I hope this helps!

Cheers,
D

Dear Danilo,

Thanks a lot for your reply!

Your code snippet definitely works, since it flattens the array. However, I was wondering if there was any feature/function in RDF that could exactly emulate the effect “np.sum” of this snippet:

import numpy as np
M = np.array([[1,2,3], [4,5,6]])
np.sum(M, axis = 0)

In this situation, one gets an output of sums as a vector. I am not sure how hard this would be to implement but such a feature seems quite natural to have in RDF

Best Regards,
Sahibjeet

Hi,

I think I do not understand well the question.
Do you have a branch with a vector<int> and you want to obtain per event the sum of the elements in such vector?

D

Dear Danilo,

Apologies for the confusion. For each element in the vector, I would like to get the sum of the element for all events in a tree.

Best Regards,
Sahibjeet Singh

Hi,

It’s basically the same syntax:

auto df = ROOT::RDataFrame(8);
auto dfVectorColumn = df.Define("v", [](){return std::vector<double>({1.,2.2,.32.2});});

auto sum = dfVectorColumn.Sum("v");
std::cout << *sum << std::endl;

Cheers,
D

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.