Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is decent advice in general, but it pays off to try and express your logic in a way that is machine friendly. That mostly means thinking carefully about how you organize the data you work with. Optimizers generally don't change data structures or memory layout but that can make orders of magnitude difference in the performance of your program. It is also often difficult to refactor later.


I find the same too. I find gcc and clang can inline functions, but can't decide to break apart a struct used only among those inlined functions and make every struct member a local variable, and then decide that one or more of those local variables should be allocated as a register for the full lifetime of the function, rather than spill onto the local stack.

So if you use a messy solution where something that should be a struct and operated on with functions, is actually just a pile of local variables within a single function, and you use macros operating on local variables instead of inlineable functions operating on structs, you get massively better performance.

e.g.

    /* slower */
    struct foo { uint32_t a,b,c,d,e,f,g,h; }
    uint32_t do_thing(struct foo *foo) {
        return foo->a ^ foo->b ^ foo->c ^ foo->d;
    }
    void blah() {
        struct foo x;
        for (...) {
            x.e = do_thing(&x) ^ x.f;
            ...
        }
    }

    /* faster */
    #define DO_THING (a^b^c^d)
    void blah() {
        uint32_t a,b,c,d,e,f,g,h;
        for (...) {
            e = DO_THING ^ f;
            ...
        }
    }


The nice thing about godbolt is that it can show you that clang not only can but do it in theory but also does it in practice:

https://aoco.compiler-explorer.com/#g:!((g:!((g:!((h:codeEdi...

The ability of turning stack allocated variables into locals(which can be then put in registers) is one of the most important passes of modern compilers.

Since compilers use SSA, where locals are immutable while lots of languages, like C have mutable variables, some compiler frontends put locals onto the stack, and let the compiler figure out what can be put into locals and how.


That's really good; clearly I haven't looked at more recent versions. The magic seems to happen in your link at SROAPass, "Scalar Replacement Of Aggregates". Very cool!

According to https://docs.hdoc.io/hdoc/llvm-project/r2E8025E445BE9CEE.htm...

> This pass takes allocations which can be completely analyzed (that is, they don't escape) and tries to turn them into scalar SSA values.

That's actually a useful hint to me. When I was trying to replace locals and macros with a struct and functions, I also used the struct directly in another struct (which was the wider source of persistence across functions), so perhaps this pass thought the struct _did_ escape. I should revisit my code and see if I can tweak it to get this optimisation applied.


I guess the chances of the compiler doing something smart increases with link-time optimizations and when keeping as much as possible inside the same "compilation unit". (In practice in the same source file.)


To make a more specific example, if you malloc()/free() within a loop, it's unlikely that the compiler will fix that for you. However, moving those calls outside of the loop (plus maybe add some realloc()s within, only if needed) is probably going to perform better.


That is something that can be easily found and usually fixed with trivial profiling. I'm more talking about data locality instead of pointer chasing. Once you set up a pointer-chasing data infrastructure changing that means rewriting most of your application.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: