If you don’t use ROOT (where the precompiled header causes trouble if options are not applied consistently across the board), you should be in a much better shape, assuming the sources are recent. I just checked that the inlining fix for sure exists in the latest of those standalone sources (see CIFactory.cpp) and probably has been for a while: Axel’s fix is from July 25. (If whatever you run locally is older, you will want to update.)
But actually, looking at it again now, Axel’s fix may not work at all. This is the code (CIFactory.cpp) from the latest standalone cling sources:
CGOpts.OptimizationLevel = 0;
CGOpts.setInlining((CGOpts.OptimizationLevel == 0)
So OptimizationLevel is always 0 when taking that route, regardless argv/argc, and inlining is subsequently always off. (I have my own cling patches, and I hacked the code to always run at opt level 2, as well as to always use normal inlining (regardless opt level).)
You can change opt level at run-time (
#pragma cling optimize 2), but mixing of headers seen under different optimization levels is still possible, so this needs to be as early as possible. And for the defaults, you’re basically too late.
Cling sets up some more defaults and its own passes in
BackendPasses::CreatePasses(), which lives in
lib/Interpreter/BackendPasses.cpp. It does not follow clang to the letter b/c when running interactively, there is no point in letting the user wait for minutes to run an expensive pass that only shaves off a few microseconds here and there. OTOH, having no optimization at all tends to create larger code with more symbols, which slows you down as well. So, some judicious choices have been made. But then if you have an esoteric case (as we had and you may, too), you may actually lose out. Hence when in doubt, compare to Clang.
In that function look specifically at how the selections for vectorization differ and how the optlevel is taken in some cases from the function argument (which originates from the transaction) and in some cases from the default options optlevel stored by the interpreter. It’s that of which I worry about in your specific case.
Cling also likes to add safety checks (eg. a pass that verifies pointers) so that the interpreter does not segfault if the user dereferences a null pointer for example. Whether that’s active, depends on a) the default and b) which function you call. The
declare() you use above is fine (no ptr checking), but others, e.g.
process() do force this pass to insert checks and others such as
parse() pick up the default. The point is that having these checks affect the effectiveness of optimization passes, you pretty sure you want to avoid it.
In short, Cling is not a compiler like Clang: it’s making trade-offs for its use as an interactive environment.
So what we did, is we defined the important templates extern (this may be harder for you, but may work for those subexpressions you talk about), compiled those separately, and loaded that as a library. There, we ran into the problem that Clang would parse templates anyway, even if declared extern. So we hacked that, see: https://bitbucket.org/wlav/cppyy-backend/src/master/cling/patches/explicit_template.diff .
That’s the background.
Back to your case: besides supplying -O2 in argv, at a minimum also do something like:
interpreter->getCI()->getCodeGenOpts().OptimizationLevel = 2;