Cling #include performance

Hi folks, this is related to topic Speed up #include (I can’t link, since I’m new) [EDIT Axel: added link], but I didn’t know the necro rules and I have more to ask so I started a new topic.

Overview
I’m using Cling to build jank, a Clojure dialect for C++ with JIT compilation. I have jank->C++ codegen and Cling JIT compilation working well, but I’ve run into a troublesome performance issue on startup. My startup code for Cling looks like this:

auto const jank_location(jank::util::process_location().unwrap().parent_path());
auto const args(jank::util::make_array("clang++", "-std=c++17"));
interpreter = std::make_unique<cling::Interpreter>(args.size(), args.data(), LLVMDIR);

interpreter->AddIncludePath(jank_location.string() + "/../include");
interpreter->AddIncludePath(jank_location.string() + "/../include/cpp");

/* This takes 1.4 seconds on my very nice desktop. */
interpreter->loadHeader("jank/prelude.hpp");

The issue, as the code shows, is that starting up any jank program with the JIT enabled sets the floor of the boot time to be 1.4 seconds, which is much higher than my use cases can tolerate.

Solutions
The topic I linked above included a few suggestions to address this, namely:

  1. Modules (not yet supported, based on the link above)
  2. PCHs (not yet supported, based on the link above)
  3. Autoloading maps (still inefficient, since the work is lazy, but is going to be done regardless)
  4. Implied, but spend time optimizing the headers!

I’ve done #4 and the only definitions in my headers are templates which need to be templates. Everythign else is declared. I could use more pimpl designs, but then I’d be changing the code just to optimize these includes and I’ll only do that if I have to.

I’m curious about the status of the first two suggestions, but I’d also like to discuss another option.

Cling’s interpreter has a method for storing and diffing states, but it doesn’t have a method (that I can find) for loading states. How reasonable would it be to load my header in Cling, serialize Cling’s state for it, and then load it back up at a later time? I’m looking for the fastest possible solution here, so if this can’t work are there any other ideas?

My goal is to keep jank’s startup time under 100ms, rather than the baseline of 1.4 seconds right now.

I appreciate your time.

Hi @jeaye !

New topic is perfect, thanks :slight_smile:

How long does it take clang-9 to “compile your header”? I.e. create a source file that includes your header and time the compilation, please. That gives us a lower bound.

You can re-try with clang-13: our experience tells us that it’ll be faster, and we’re working on upgrading cling to clang version 13.

ROOT (the main use case of cling here) is using modules and has used a PCH. It works well. (The other post is from 8 years ago, things change :slight_smile: ) I would recommend the use of a PCH if you don’t have a complex set of headers and packages to include.

Let us know what you find out and when you run into a problem!

Cheers, Axel

Thanks for the response, @Axel!

Clang include performance

❯ cat include-perf.cpp 
#include <jank/prelude.hpp>

int main()
{ }
❯ time clang++ -Iinclude/cpp -Ibuild/vcpkg_installed/x64-linux/include -std=c++17 -w include-perf.cpp
  • clang-5: 1.3 seconds (median over 5 runs)
  • clang-9: 1.7 seconds (median over 5 runs)
  • clang-13: 1.45 seconds (median over 5 runs)

Pre-compiled header test

In a local test with cling-master/clang-9, using a pre-compiled header drops the compilation time down to 265ms, which is much better.

Next steps

I’m still curious how to push it further, to get it under the 100ms mark. In your experience with root, did modules help much more than a PCH?

I don’t think modules are necessarily faster - they are a more modular solution to the issue, useful for large software systems where only parts are ever used.

You can use one of the many clang-based compilation performance tools out there; they might give an indication of what you can do to improve the compilation time.

In general, instantiations are expensive, you’re better off moving functions that don’t depend on template parameters into a non-templated base.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.