Speed up TVector operations? std::vector?

jfcaron · August 21, 2012, 11:54pm

Hello, I have a compiled analysis program which does operations on values stored in a TVectorD. The TVectorDs have 20002 elements, and the operations include boxcar smoothing (over 100s of elements), searches for threshold crossings, and summing the elements in the TVectorD.

My code runs extremely slowly (~30 minutes for 30000 entries in a TTree, each entry with 3 TVectorDs). I’ve checked with valgrind and it seems that the culprit is indeed TVectorD operations. I’ve included the callgrind annotated output in case someone wants to check that. It is my first time using valgrind.

I’m trying to figure out how to make the code run faster, so I’m hoping that there are more efficient ways to do that I am doing, namely the smoothing, threshold crossing searches, and sums. Are std::vectors better than TVectorDs for this kind of thing? I’m using some TVectorD-specific methods (like .Sum() and .Norm2Sqr()), but it wouldn’t be too hard to code my own functions that do this for a std::vector. I’ve attached the actual analysis code in case people want to look at it.

Thanks to any experts with advice in this matter.
Jean-François Caron
standard_analysis.C (20.9 KB)
valgrind_summary.txt (8.54 KB)

pcanal · August 23, 2012, 7:02pm

Hi,

The callgrind summary suggest a lots of call to TVectorT::operator() which is ‘slow’. Do you know where they are coming from (for example you could send me the complete callgrind output).

Thanks,
Philippe.

jfcaron · August 23, 2012, 10:48pm

Hi Philippe, I have attached the full callgrind output, but I had to gzip it since it was greater than 2MB.

The TVectorT::operator() which is accessed a lot is from the threshold-crossing algorithms that are applied to various waveforms and from the smoothing of waveforms. For example, my smoothing function looks like this:

TVectorD smoothed_tv(TVectorD ampl_tv, Int_t n_frames) { // Form the N-frame rolling average. Int_t n_samples = ampl_tv.GetNoElements(); TVectorD smooth(n_samples); smooth.Zero(); // Initialize a new TVectorD to contain the N-frame average. for(Int_t i=0;i<=n_samples-n_frames;i++) { // Add up the n_frames, then divide by n. for(Int_t j=0;j<n_frames;j++) { smooth[i] += ampl_tv[i+j]; } smooth[i] /= n_frames*1.0; } // For the remaining n_frames samples, we just use the last smoothed value. for(Int_t i=n_samples-n_frames+1;i<n_samples;i++) { smooth[i] = smooth[n_samples-n_frames]; } return smooth; }
Which accesses every single element of the ampl_tv and smooth TVectors. I could make it a little more efficient by cutting off the end of the smoothed vector for example, but it doubt that’s the source of the slowness.
callgrind.out.51710.gz (327 KB)

pcanal · August 23, 2012, 11:39pm

Hi,

To speed up this code, you can skip the protection provided by the operator() (and the optional shift in index value) by accessing directly the underlying array. smooth.GetMatrixArray()[i] += ampl_tv.GetMatrixArray()[i+j];

Also to improve performance rather than return the TVectorD (which can incur a significant copying cost. Pass an already constructed one by reference and pass the input also by (const) reference.void smoothed_tv(TVectorD &smooth, const TVectorD &ampl_tv, Int_t n_frames) { // Form the N-frame rolling average. Int_t n_samples = ampl_tv.GetNoElements(); smooth.Zero(); // Initialize a new TVectorD to contain the N-frame average. ...

Cheers,
Philippe.

jfcaron · August 24, 2012, 6:52pm

Thanks, the GetMatrixArray() tip reduced the time is takes to run a job by a factor of 2. Passing-by-reference did not seem to make a noticeable difference in the speed. Perhaps there is another bottleneck before this one matters. The factor of 2 speedup makes the program much more reasonable to use now, so I won’t bother trying to optimize further.

Jean-François

jfcaron · November 13, 2012, 5:16pm

I’m reviving this old-ish post to clarify a related TVector issue. If I access the TVector many times in my function, is it ok to simply use tv.GetMatrixArray() every time? Or should I be doing

Double_t * tv_ptr = tv.GetMatrixArray();

and using the pointer everywhere? Is there a performance difference?

Also, can you elaborate on the “protection” to which you refer from the operator()? What kinds of risks am I taking in using this speedup?

Thanks,
Jean-François

pcanal · November 13, 2012, 8:37pm

Hi,

It is safe if and only if the TVector does not need to reallocate the underlying memory (i.e. as long it does not grow). The extra protection, if I remember correctly, avoid out-of-bounds access.

Philippe.

jfcaron · November 13, 2012, 11:40pm

Thanks, my vectors are fixed-length, so I will use the GetMatrixArray a lot more. I guess you might ask why I use a TVector at all, then, but it’s because it has many convenient methods that C-style arrays and std::vectors do not have.

Another question:
I have been reading about “return value optimization” and “move semantics”, which should essentially remove the need to pass-by-reference a non-const pre-allocated TVector to fill. Are TVectors (& ACliC) able to do this kind of more modern C++ behavior? I’ve changed the input TVector argument to be a const reference, but I do not want to have to pass-by-reference a non-const vector just for filling (for aesthetic reasons, if nothing else).

Jean-François

Reference about move semantics which led to my question: thbecker.net/articles/rvalue_ref … on_01.html

pcanal · November 14, 2012, 1:37am

Hi,

CINT does not support any C++11 concepts. So ROOT 5 will never support move semantic. On the other hand cling does support it and thus in the not too distant future we might start supporting in with ROOT 6.

Cheers,
Philippe.