More

flatfinger · on April 17, 2020

What should be relevant is not programmer "intent", but rather whether the behavior would likely match the that of an implementation which give that describe behavior of actions priority over parts of the Standard that would characterize them as "Undefined Behavior".

flatfinger · on April 17, 2020

Has there ever been any consensus as to what that "...or is copied as an array of character type..." text is supposed to mean, or what sort of hoops must be jumped through for a strictly conforming program to generate an object whose bit pattern matches another without copying the effective type thereof?

flatfinger · on April 17, 2020

The ability to use pointers to structures with a Common initial Sequence goes back at least to 1974--before unions were invented. When C89 was written, it would have been plausible that an implementation could uphold the Common Initial Sequence guarantees for pointers without upholding them for unions, but rather less plausible that implementations could do the reverse. Thus, the Standard explicitly specified that the guarantee is usable for unions, but saw no need to redundantly specify that it also worked for pointers.

If compilers would recognize that operation involving a pointer/lvalue that is freshly visibly based on another is an action that at least potentially involves the latter, that would be sufficient to make code that relies upon the CIS work. Unfortunately, some compilers are willfully blind to such things.

flatfinger · on April 17, 2020

How about allowing `return` to be used as a qualifier within a function or prototype's argument which, if present, would adjust the qualifiers of the function's return value to match those of the argument, e.g. adding "const" or removing "volatile" as appropriate?

This, if one did:

    char *strchr2(char return *restrict src, char target)
    {
      while(*src)
      {
        if (*src == target) return src;
        src++;
      }
    }

then one version of the function could support both const and non-const usage. BTW, I'd also like to see "register" and "return register" be usable as a qualifiers for pointer-type function parameters which would promise that the passed in pointer wouldn't "escape", or else that it could only escape through the return value (so a compiler that could see everything done with the return value wouldn't have to worry about the argument escaping).

flatfinger · on April 17, 2020

Functions like malloc are only required for hosted implementations. Many operating systems are built using freestanding implementations.

Further, on many platforms, one should avoid using malloc() unless portability is more important than performance or safety. Some operating systems support useful features like the ability to allocate objects with different expected lifetimes in different heaps, so as to help avoid fragmentation, or arrange to have allocations a program can survive without fail while there is still enough memory to handle critical allocations. Any library that insists upon using "malloc()" will be less than ideal for use with any such operating system.

flatfinger · on April 17, 2020

See post above. There is no good way for compilers to handle that case, but gcc gets "creative" even in cases where the authors of C89 made their intentions clear.

flatfinger · on April 17, 2020

I can't really blame gcc for that one, since the most straightforward way of using signed integer arithmetic would yield a negative value if the result is bigger than INT_MAX, but it would be very weird that programs would expect and rely upon that behavior.

On the other hand, even the function "unsigned mul_mod_65536(unsigned short x, unsigned short y) { return (x * y) & 0xFFFF; }" which the authors of the Standard would have expected commonplace implementations to process in consistent fashion for all possible values of "x" and "y" [the Rationale describes their expectations] will sometimes cause gcc to jump the rails if the arithmetical value of the product exceeds INT_MAX, despite the fact that the sign bit of the computation is ignored. If, for example, the product would exceed INT_MAX on the second iteration of a loop that should run a variable number of iterations, gcc will replace the loop for code that just handles the first iteration.

flatfinger · on April 17, 2020

It's too bad Unicode wasn't designed around the concept of easily-recognizable grapheme clusters and "write-only" [non-round-trip] forms that are normalized in various ways. A text layout engine shouldn't have to have detailed knowledge of rules that are constantly subject to change, but if there were a standard representation for a Unicode string where all grapheme clusters are marked and everything is listed in left-to-right order, and an OS function was available to convert a Unicode string into such a form, a text-layout using that OS routine would be able to accommodate future additions to the character set and and glyph-joining rules without having to know anything about them.

a1369209993 · on April 17, 2020

You can't do that without commiting to not supporting pathological text, otherwise you're stuck adding new special cases to the layout engine every update anyway.

I do have some ideas for a better encoding (like, I assume, anyone competent with sufficient free time and interest in text encoding), but there's a lot of reluctance to put effort into something that's already completely eclipsed by a technically inferior but not completely unusable alternative, so I've had it mostly shelved.

flatfinger · on April 16, 2020

A conforming implementation could extend the language with an 8-bit type __nonaliasingbyte which has no special aliasing privileges, and define uint8_t as being synonymous with that type.

On the other hand, the Standard should never have given character types special aliasing rules to begin with. Such rules would have been unnecessary if the Standard had noted that an access to an lvalue which is freshly visibly derived from another is an access to the lvalue from which it is derived. The question of whether a compiler recognizes a particular lvalue as "freshly visibly derived" from another is a Quality of Implementation issue outside the Standard's jurisdiction.

flatfinger · on April 16, 2020

If the Standard were to make recursion an optional feature, many programs' stack usage could be statically verified. Indeed, there are some not-quite-conforming compilers which can statically verify stack usage--a feature which for many purposes would be far more useful than support for recursion.