Google-perftools

Can google-perftools be useful for ROOT?

Fast, mutli-threaded malloc() and nifty performance analysis tools

These tools are for use by developers so that they can create more robust applications. Especially of use to those developing multi-threaded applications in C++ with templates. Includes TCMalloc, heap-checker, heap-profiler and cpu-profiler.

Hi,
we use valgrind and callgrind excessively; that’s why ROOT is as good as it is :slight_smile: Would you mind providing benchmark.C results for standard ROOT versus ROOT built with google’s malloc?
Axel.

So, I’ll do that but I don’t know when. I’ll keep you in touch.

I compiled ROOT for benchmarking following this.
One copy was linked with tcmalloc 0.94.1. It was done by adding “-L/path/to/libs -ltcmalloc” to SYSLIBS and CILIBS in config/Makefile.linux and to ROOTLDFLAGS in test/Makefile.arch, and addind path to LD_LIBRARY_PATH env variable.
ldd shows that stress was really linked to tcmalloc.
Another ROOT copy was original.

[code]******************************************************************

  • Starting R O O T - S T R E S S test suite with 3000 events

Test 1 : Functions, Random Numbers, Histogram Fits… OK
Test 2 : Check size & compression factor of a Root file… OK
Test 3 : Purge, Reuse of gaps in TFile… OK
Test 4 : Test of 2-d histograms, functions, 2-d fits… OK
Test 5 : Test graphics & Postscript… OK
Test 6 : Test subdirectories in a Root file… OK
Test 7 : TNtuple, selections, TCut, TCutG, TEventList… OK
Test 8 : Trees split and compression modes… OK
Test 9 : Analyze Event.root file of stress 8… OK
Test 10 : Create 10 files starting from Event.root… OK
Test 11 : Test chains of Trees using the 10 files… OK
Test 12 : Compare histograms of test 9 and 11… OK
Test 13 : Test merging files of a chain… OK
Test 14 : Check correct rebuilt of Event.root in test 13… OK
Test 15 : Divert Tree branches to separate files… OK
Test 16 : CINT test (3 nested loops) with LHCb trigger… OK


  • Linux localhost 2.6.18-5-k7 #1 SMP Wed Oct 3 00:47:27 UTC 2007

[/code]

For google tcmalloc

stress    : Total I/O = 1765.8 Mbytes, I = 1305.3, O = 460.5
stress    : Compr I/O = 1299.0 Mbytes, I =  956.6, O = 342.4
stress    : Real Time = 101.39 seconds Cpu Time =  97.84 seconds
******************************************************************
*  ROOTMARKS =1000.9   *  Root5.16/00   20070627/715
******************************************************************

for original malloc

[code]stress : Total I/O = 1765.8 Mbytes, I = 1305.3, O = 460.5
stress : Compr I/O = 1299.0 Mbytes, I = 956.6, O = 342.4
stress : Real Time = 108.33 seconds Cpu Time = 102.56 seconds


  • ROOTMARKS = 954.9 * Root5.16/00 20070627/715

[/code]
It shows 5% boost :slight_smile: But, tcmalloc is more effective for multithreaded applications with large amounts of memory to allocate.
I’m sorry for this short answer, I don’t have enough time for thorough study of it now and experience (that is more crucial). But, I can do some trivial test if you need.