Interpreter->declare crashing on long strings

alandefreitas · August 24, 2018, 11:02pm

I’m trying to declare a function with interpreter->declare(function_str); but it always crashes when the string is long enough. Has this happened to anyone else?

I can’t see in what file is the error because I’m using the binaries. I tried to compile from the source code but it fails to compile libLTO because I don’t have enough memory.

I tried to replicate the error on a Mac but it works there. Even with the binaries.

Danilo · August 25, 2018, 6:50am

Hi,

that platform is this?
Can you post an algorithm which produces enough code in a string that when it’s jitted with Declare produces a crash?

Cheers,
D

alandefreitas · August 25, 2018, 3:16pm

Hi,

Of course. There it is:

    // declaring interpreter
    cling::Interpreter interp(argc_local, argv_local, LLVMDIR);

    // creating the code
    int dim = 50000; // change this number to replicate the error or the expected behaviour
    std::string str_code = "#include <vector>\n#include <cmath>\nextern \"C\" {\n"
            "double my_objective_function(std::vector<double>& _double_decision_variables){\n"
            "    const std::vector<double> &x = _double_decision_variables;\n"
            "    return ";
    for (int i = 0; i < dim; ++i){
        str_code += "std::pow(x[" + std::to_string(i) + "], 2)" + ((i!=dim-1) ? "+" : ";\n");
    }
    str_code += "}\n}";
    std::cout << str_code << std::endl;

    // declaring the function
    interp.declare(str_code);
    using objective_func_type = double(std::vector<double>&);
    objective_func_type *func_pointer;
    void *addr = interp.getAddressOfGlobal("my_objective_function");
    func_pointer = cling::utils::VoidToFunctionPtr<objective_func_type *>(addr);

    // using the function
    if (func_pointer) {
        std::vector<double> decision_variables(dim,1.0);
        double fx = func_pointer(decision_variables);
        std::cout << "fx = " << fx << std::endl;
    } else {
        std::cout << "Could not get a function pointer" << std::endl;
    }

This code only works here when dim is small enough (the function gets smaller). In the machines I have that’s about 50000 on a Mac and 10000 on Ubuntu 17.

The error is:

Exception: EXC_BAD_ACCESS (code=2, address=0x7ffee3ef6fa8)

That’s what the stack looks like:

[...]
(anonymous namespace)::AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation)
clang::Sema::CheckCompletedExpr(clang::Expr*, clang::SourceLocation, bool)
clang::Sema::ActOnFinishFullExpr(clang::Expr*, clang::SourceLocation, bool, bool, bool)
clang::Sema::BuildReturnStmt(clang::SourceLocation, clang::Expr*)
clang::Sema::ActOnReturnStmt(clang::SourceLocation, clang::Expr*, clang::Scope*)
clang::Parser::ParseReturnStatement()
clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::AllowedConstructsKind, clang::SourceLocation*, clang::Parser::ParsedAttributesWithRange&)
clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::AllowedConstructsKind, clang::SourceLocation*)
clang::Parser::ParseCompoundStatementBody(bool)
clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&)
clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*)
clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, unsigned int, clang::SourceLocation*, clang::Parser::ForRangeInit*)
clang::Parser::ParseDeclOrFunctionDefInternal(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec&, clang::AccessSpecifier)
clang::Parser::ParseDeclarationOrFunctionDefinition(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*, clang::AccessSpecifier)
clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*)
clang::Parser::ParseLinkage(clang::ParsingDeclSpec&, unsigned int)
clang::Parser::ParseDeclOrFunctionDefInternal(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec&, clang::AccessSpecifier)
clang::Parser::ParseDeclarationOrFunctionDefinition(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*, clang::AccessSpecifier)
clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*)
clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&)
cling::IncrementalParser::ParseInternal(llvm::StringRef)
cling::IncrementalParser::Compile(llvm::StringRef, cling::CompilationOptions const&)
cling::Interpreter::declare(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, cling::Transaction**)
main
start

What I denoted as [...] is just (anonymous namespace)::AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation) like a thousand times until it crashes.

Axel · August 27, 2018, 8:11pm

What’s the approximate length of the string that starts to cause problems?

Axel · August 27, 2018, 8:27pm

OK never mind, I can reproduce with stand-alone clang:

$ clang++ -c longstr.cxx
clang: error: unable to execute command: Illegal instruction: 4
clang: error: clang frontend command failed due to signal (use -v to see invocation)
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
clang: note: diagnostic msg: PLEASE submit a bug report to http://developer.apple.com/bugreporter/ and include the crash backtrace, preprocessed source, and associated run script.
clang: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /var/folders/b1/xp46rm512wv7mbrx82vpf3m00000gn/T/longstr-a4aaa5.cpp
clang: note: diagnostic msg: /var/folders/b1/xp46rm512wv7mbrx82vpf3m00000gn/T/longstr-a4aaa5.sh
clang: note: diagnostic msg: Crash backtrace is located in
clang: note: diagnostic msg: /Users/axel/Library/Logs/DiagnosticReports/clang_<YYYY-MM-DD-HHMMSS>_<hostname>.crash
clang: note: diagnostic msg: (choose the .crash file that corresponds to your crash)
clang: note: diagnostic msg:

********************

It’s not the long string, it’s the extremely long expression. Solution: please don’t do that

E.g. buffer the intermediate results:

double res = std::pow(x[0],2);
res += std::pow(x[1],2);
...

That works nicely - as nicely as it gets with code like that… (i.e. it still takes a hell of a time to compile - I didn’t have the patience to see it finish.)

Cheers, Axel.

alandefreitas · August 27, 2018, 8:51pm

Hi Axel,

Thanks for the diagnostics. I had no idea what it was because I’m just using the binary but I did have a clue that problem wouldn’t be the string size per se.

Do you know if there is a way to increase the maximum size of these expressions? I guess that’s a problem with clang and not cling. But there might be a solution because I’m able to compile the unrolled expression directly with clang. Clang might buffer that internally. The limit on the expression size also depends on the machine so it might be a memory problem I have somewhere.

I know this apparent resistance to buffer the expressions might sound stupid because the example I gave was trivial. But I still ask because I’m using cling for an application where the expressions are automatically generated by a symbolic library we have developed for large-scale problems and the expressions are usually not as easy to decompose as the one I showed.

Thanks,

wlav · August 27, 2018, 9:29pm

The last time I made a C++ compiler crash b/c of too long expressions with symbolic code was back in 1998, so not sure whether the same tricks still help, but making sure that optimizations, and in particular inlining, are on, may be a way out.

(That inlining thing may seem counterintuitive, but it depends on what you are doing: mathematical sub-expressions that end up, after inlining, multiplying with zero for example, can be culled early by the compiler.)

Cling has some optimizations off by default and inlining has only recently been fixed, so that may explain the different behavior that you see between Cling and Clang. Either way, it’s trivial to test whether -O2 on the CLI for Clang makes a difference in maximum length with your actual code.

If the expressions are generated from templates, then specializations might be a way out to simplify specific sub-expressions.

(Aside, when I first saw your posting, I also thought it’d be memory, but in my estimate then, even the longer string won’t get much bigger than ~200MB after parsing, so I don’t think that’s it. I do know that debugging info with Clang is much, much smaller on Mac than on Linux, but even that would only add a factor of 3 the most. Not small, but nowhere near out-of-memory territory, I assume.)

alandefreitas · August 27, 2018, 10:45pm

Hi @wlav!

Thank you so much for your feedback. That’s exactly my problem. Do you still work with symbolic code?

We already simplify sub-expressions significantly. Of course, It’s always a work in progress.

I tried to use -O2 on the CLI but had no luck in the first example. I might be doing something wrong though:

    cling::Interpreter interp(argc, argv, LLVMDIR);

where argv is:

The LLVM directory
"-I" LLVMDIR "/include"
"-Wno-return-type-c-linkage"
"-O2"

I don’t know if it’s that trivial to turn on O2 on cling.

The only solution I have now is buffering the subexpressions but that’s odd because I thought the underlying compiler should be able to do that. This is a problem because I would have to potentially change the symbolic library for every new problem I have.

I guess there’s no way around it. I’ll get back to the symbolic library then.

wlav · August 28, 2018, 1:32am

I don’t do symbolic work myself anymore. Do work with people who do, but mostly in Java and Python. I did have to deal with Cling optimizations recently and it is not trivial. Wish it was a simple CLI option. Try it out with Clang first (where it is a simple CLI option). If it improves there, I can go through the details (I’m assuming you’re working with Cling sec, not ROOT, which makes things a bit different). The pow()s don’t benefit from -O2, obviously, so can’t try that out.

But using your posted example as I thought of another system-dependent variable, I now know what the problem is: you simply run out of stack space. You can increase that using ulimit -s in 1K increment. Eg. ulimit -s 65532 for 64MB of stack (default hard limit on my Mac).

I’m on a battery, so I’m not going to wait for it to finish, but the 50K example is spinning happily. Memory is pretty stable at ~310MB. Default stack of 8MB doesn’t do much more than 11.5K pow()s. Don’t think it’s a good idea, though, given how long it takes (this is w/o optimizations).

Now, AFAICT, the only point of AnalyzeImplicitConversions is to diagnose warning-worthy stuff. Unfortunately switching off all warnings only mutes them downstream, it does not pre-empt the checking itself. But you could patch Clang in Cling to have AnalyzeImplicitConversions return immediately if -Wno-everything is set.

alandefreitas · August 28, 2018, 4:32am

Thanks. I tried it with clang and it just worked for 50K!

I think I’ll still need to work on the buffer strategy, change the symbolic library, and optimize the code as a more general solution.

Have you used optimizations on cling before?

wlav · August 28, 2018, 4:36pm

Yes, I’ve been working a lot on optimizing Cling usage from python when templates are involved. The context is vspline (https://bitbucket.org/kfj/vspline) and its new python friend (https://bitbucket.org/kfj/python-vspline). From where we started we got a speed-up of 50000x in run-time and 40x in compile time.

Based on that experience, I say that Cling (and Clang for that matter) out-of-the-box is sub-optimal.

Not saying of course that the same gains are there to be had for your code …

How do you get your cling? Through ROOT or standalone? And which version? It matters b/c Axel has been making fixes upstream, so some of the things I’d otherwise mention are no longer relevant.

alandefreitas · August 28, 2018, 5:59pm

I see. I’m downloading the most recent standalone version of cling and using it directly in C++ as a library. I don’t use ROOT very often. I guess I’ll need some optimization options if I change to the buffering solution. I thought I could just send “-O2” as an option to cling but it seems like that’s not the case. Are there any other possibilities for optimization then?

wlav · August 28, 2018, 11:54pm

If you don’t use ROOT (where the precompiled header causes trouble if options are not applied consistently across the board), you should be in a much better shape, assuming the sources are recent. I just checked that the inlining fix for sure exists in the latest of those standalone sources (see CIFactory.cpp) and probably has been for a while: Axel’s fix is from July 25. (If whatever you run locally is older, you will want to update.)

But actually, looking at it again now, Axel’s fix may not work at all. This is the code (CIFactory.cpp) from the latest standalone cling sources:

    CGOpts.OptimizationLevel = 0;
    // ...
    CGOpts.setInlining((CGOpts.OptimizationLevel == 0)
                       ? CodeGenOptions::OnlyAlwaysInlining
                       : CodeGenOptions::NormalInlining);

So OptimizationLevel is always 0 when taking that route, regardless argv/argc, and inlining is subsequently always off. (I have my own cling patches, and I hacked the code to always run at opt level 2, as well as to always use normal inlining (regardless opt level).)

You can change opt level at run-time (#pragma cling optimize 2), but mixing of headers seen under different optimization levels is still possible, so this needs to be as early as possible. And for the defaults, you’re basically too late.

Cling sets up some more defaults and its own passes in BackendPasses::CreatePasses(), which lives in lib/Interpreter/BackendPasses.cpp. It does not follow clang to the letter b/c when running interactively, there is no point in letting the user wait for minutes to run an expensive pass that only shaves off a few microseconds here and there. OTOH, having no optimization at all tends to create larger code with more symbols, which slows you down as well. So, some judicious choices have been made. But then if you have an esoteric case (as we had and you may, too), you may actually lose out. Hence when in doubt, compare to Clang.

In that function look specifically at how the selections for vectorization differ and how the optlevel is taken in some cases from the function argument (which originates from the transaction) and in some cases from the default options optlevel stored by the interpreter. It’s that of which I worry about in your specific case.

Cling also likes to add safety checks (eg. a pass that verifies pointers) so that the interpreter does not segfault if the user dereferences a null pointer for example. Whether that’s active, depends on a) the default and b) which function you call. The declare() you use above is fine (no ptr checking), but others, e.g. loadHeader() or process() do force this pass to insert checks and others such as parse() pick up the default. The point is that having these checks affect the effectiveness of optimization passes, you pretty sure you want to avoid it.

In short, Cling is not a compiler like Clang: it’s making trade-offs for its use as an interactive environment.

So what we did, is we defined the important templates extern (this may be harder for you, but may work for those subexpressions you talk about), compiled those separately, and loaded that as a library. There, we ran into the problem that Clang would parse templates anyway, even if declared extern. So we hacked that, see: https://bitbucket.org/wlav/cppyy-backend/src/master/cling/patches/explicit_template.diff .

That’s the background.

Back to your case: besides supplying -O2 in argv, at a minimum also do something like:

   interpreter->getCI()->getCodeGenOpts().OptimizationLevel = 2;
   interpreter->getCI()->getCodeGenOpts().setInlining(CodeGenOptions::NormalInlining);
   interpreter->setDefaultOptLevel(2);

Axel · August 29, 2018, 6:12am

I’m happy to receive a patch / pull request that forwards a -O2 argument to the CGOpts etc. cling has this fixed because we saw issues with higher optimization levels; I’m working on that…

alandefreitas · August 30, 2018, 12:03am

Thanks for the answers. I’ve adjusted my code to take those points into account.

Best,

system · September 13, 2018, 12:03am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.