Only places where I've seen LTO not be used are places with bad and unreliable build systems that systematically introduce undefined behaviour by violating the ODR.
The only organization I've worked in that had comprehensive LTO for C++ code was Google. I've worked at other orgs even with 1000s of engineers where LTO, PGO, BOLT, and other things you might consider standard techniques were considered voodoo and too much trouble to bother with, despite the obvious efficiency improvements being left on the table.
I helped with pgo work at Microsoft over 15 years ago, back when it was a Microsoft Research project.
The issue with early pgo implementations was getting a really good profile, as you had to have automation capable of fully exercising code paths that you knew would be hot in actual usage, and you needed good instrumentation to know what code paths those were!
The same problem exists now days, but programs are instrumented to hell and back to collect usage data.
I am willing to assume that organizations dedicated to shipping software to customers like Microsoft or Autodesk or somebody like that are almost certainly all in on optimization techniques. The organizations where I worked are ones that are operating first party or third party software in the cloud where they're responsible for building their own artifacts.
PGO is pretty difficult. In my experience compilers don't seem to know the difference between "this thing never runs" and "we don't have any information about if this thing runs". Similarly it might be useful to know "is this branch predictable" more than just "what % is it taken".
CPUs are so dynamic anyway that there often isn't a way to pass down the information you'd get from the profile. eg I don't think Intel actually recommends any way of hinting branch directions.
Generally yes. This is not for "simple" cores this is the state-of-the-art static branch prediction algorithm as described by Intel in their optimization manual.
"Branches that do not have a history in the BTB ... are predicted using a static prediction algorithm: Predict forward conditional branches to be NOT taken. Predict backward conditional branches to be taken."
It then goes on to recommend exactly what every optimizing compiler and post-link optimizers like BOLT do:
"Arrange code to be consistent with the static branch prediction algorithm: make the fall-through code following a conditional branch be the likely target for a branch with a forward target, and make the fall-through code following a conditional branch be the unlikely target for a branch with a backward target."
This is why a reduction in taken forward branches is one of the key statistics that BOLT reports.
"Vastly" eh? I seem to recall that LLVM ThinLTO has slight regressions compared to GCC LTO on specCPU but on Google's own applications the superior whole-program devirtualization offered only with ThinLTO is a net win.
As a user, building with thin-lto vs full-lto generally produces pretty similar performance in no small part because a huge amount of effort has gone into making the summaries as effective as possible for key performance needs.
As a compiler developer, especially when developing static analysis warnings rather than optimization passes, the number of cases where I've run into "this would be viable if we had full-lto" has been pretty high.
Yeah they just never hired me. They also invented BOLT.
I think there is a valley in terms of organization size where you have tons of engineers but not enough to accomplish peak optimization of C++ projects. These are the orgs that are spending millions to operate, for example, the VERY not-optimized packages of postgresql from Ubuntu, in AWS.
Violating ODR doesn't introduce UB it's IFNDR, Ill-formed No Diagnostic Required which is much worse in principle and in such cases probably also in practice.
UB is a runtime phenemenon, it happens, or it doesn't, and we may be able to ensure the case where it happens doesn't occur with ordinary human controls.
But IFNDR is a property of the compiled program, if you have IFNDR (by some estimates that's most C++ programs) your program has no defined behaviour and never did, so there is no possible countermeasure, too bad game over.
I am curious where you have seen LTO used. Linux distributions and open source projects in general rarely use LTO. Their build systems are usually very good.