I am using the Cling interpreter as an embedded network interface, so that RPCs are much simpler to develop. It works exactly like it was supposed.
In order to use it, the interpreter needs to parse the include files of the program. Unfortunately it has dependencies to huge header only libraries like boost and CGAL. Parsing the program’s headers takes about 1-2 mins which is simply too long for the intended use cases like executing tests on the remote system.
Does anyone know if it is possible to speed things up? Can I use “RuntimeUniverse.h” for this purpose? Any other ideas?
Hi, my original comment got lost, and attaching the callgrind log doesn’t seem to have worked.
Anyway, I was writing the following:
I’m moving Belle2’s reconstruction software to root6 (see my other thread for my experiences) and I also noticed a significant slowdown in startup times from a noticeable, but subsecond, delay to a few seconds (2-3, not measured). According to callgrind virtually all the time (80%) is spent inside clag::Parser::ParseNamespace. Of this time, most of the time is spent in clang::Preprocessor::Lex, below which the computing time spreads out more between different routines.
Thanks for he nice callgraph! What you see is currently expected, for the beta: we parse headers at startup as a work-around until I have fixed the modules (PCMs).
The original post was about cling and a large header file - where for now we don’t plan to provide the same feature.
Ok, I didn’t note this subtle difference. I will note though, that also reading a .C file generated for a 2D histogram with lots of bins takes very long, so parsing speed is also an issue outside of startup times.
BTW here’s a callgrind profile for evaluating this file. It’s a fairly pathological case, so I’m not surprised that it spends virtually all it’s time in some utility function of clang; the number of calls to clang::SUnit::setHeightDirty() (484e6!) is a bit scary, though.
ps: i quickly tested a hypothesis: removing roughly half the lines from the file leads to 121e6 calls, removing half of that leads to 39e6 calls. In other words, this hits a quadratic bottleneck.