Rigid ABIs aren't necessary for statically linked programs. Ideally, the compiler would look at the usage of the function in context and figure out an ABI specifically for that function that minimizes unnecessary copies and register churn.
IMHO this is the next logical step in LTO; today we leave a lot of code size and performance on the floor in order to meet some arbitrary ABI.
I would argue that is largely true because we got the ABIs and the hardware to support them to be highly optimized. Thing slow down very quickly if one gets off that hard-won autobahn of ABI efficiency.
Partly it's due to lack of better ideas for effective inter-procedural analysis and specialization, but it could also be a symptom of working around the cost of ABIs.
The point of interfaces is to decouple caller implementation details from callee implementation details, which almost by definition prevents optimization opportunities that rely on the respective details. There is no free lunch, so to speak. Whole-program optimization affords more optimizations, but also reduces tractability of the generated code and its relation to the source code, including the modularization present in the source code.
In the current software landscape, I don’t see these additional optimizations as a priority.
When looking at the rv32imc emitted by the Rust compiler, it's clear that there would be a lot less code if the compiler could choose different registers than those defined in the ABI for the arguments of leaf functions.
Not to mention issues like the op mentions making it impossible to properly take advantage of RVO with stuff like Result<T> and the default ABI.
IMHO this is the next logical step in LTO; today we leave a lot of code size and performance on the floor in order to meet some arbitrary ABI.