More

msebor · on April 14, 2020

I'd expect a proposal for (1) to be well received. The only proposal I recall that deals with (2) is http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2067.pdf. I think it's still being discussed. (3) is highly unlikely if it involved ABI changes. Even if it could be done without such changes unless there is a precedent for it in an existing compiler (and preferably more), it would likely be a tough sell.

floatms · on April 14, 2020

Is the linked proposal really dealing with unnamed struct types? I skimmed it and it seems like it is dealing with named constants. Also, is there a proposal for (1) currently, or is someone planning on writing one? Regarding (3), yes, this one was mostly wishful thinking.

msebor · on April 14, 2020

WG14 in general looks favorably at proposals to align C more closely with C++ (within the overall spirit of the language) and I'd expect (1) would viewed in that light.

I'd also say there is consensus that (2) would be beneficial. There are some good ideas in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2067.pdf although I don't think repurposing the register keyword for it was very popular. Not just because it wouldn't be compatible with C++ which deprecated register some time ago, but also because it's novel with no implementation or user experience behind it. My impression that this is waiting for a new proposal.

msebor · on April 14, 2020

Several of us discussed typeof and I'd expect a proposal for a feature along these lines to be well received. (I recall someone even saying they're working on one but that shouldn't stop anyone from submitting one of their own.)

JoshTriplett · on April 14, 2020

I'm glad to hear that.

What about statement expressions? They're quite useful, and supported by multiple independent compilers.

msebor · on April 14, 2020

I'm not aware of recent proposals for those but we have discussed ideas along those lines (closures: N2030, C++ lambdas, Apple Blocks: N1451, and I think there was one from Cilk). I think there was interest but not enough support for the details and likely also concerns from implementers.

msebor · on April 14, 2020

There are many improved versions of string APIs out there, too many in fact to choose from, and most suffer from one flaw or another, depending on one's point of view. Most of my recent proposals to incorporate some that do solve some of the most glaring problems and that have been widely available for a decade or more and are even parts of other standards (POSIX) have been rejected by the committee. I think only memccpy and strdup and strdndup were added for C2X. (See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2349.htm for an overview.)

AceJohnny2 · on April 14, 2020

> Most of my recent proposals [...] have been rejected by the committee.

Does anyone have insight on why?

saagarjha · on April 15, 2020

memccpy is a very welcome addition in the front of copying strings; what else were you thinking of proposing?

msebor · on April 14, 2020

C17 doesn't look much different than C89. If you are used to K&R C there may be some adjustment but I would expect it to be manageable.

What might perhaps be more challenging is adjusting to the changes in compilers. They tend to optimize code more aggressively and so writing code that closely follows the rules of the language (rather than making assumptions about the underlying hardware, even valid ones) is more important today than it was back in the 80's.

rmind · on April 14, 2020

Given the above, it is worth pointing out that the compilers are also much much better in verification and useful warnings/errors. Back in the (very old) days, there was a motivation to cut down PCC (Portable C Compiler) and give the birth to Lint as a separate application (because cutting the compilation time was a greater priority). The current trends are completely the opposite: compilers are getting increasingly more powerful built-in static analyzers and sanitizers by default.

I think the lack of powerful tools in 1990s-2000s contributed to the thought by some that C is 'diffcult' in terms of safety. However, things have moved on.

pjmlp · on April 14, 2020

As additional info,

> Although the first edition of K&R described most of the rules that brought C's type structure to its present form, many programs written in the older, more relaxed style persisted, and so did compilers that tolerated it. To encourage people to pay more attention to the official language rules, to detect legal but suspicious constructions, and to help find interface mismatches undetectable with simple mechanisms for separate compilation, Steve Johnson adapted his pcc compiler to produce lint [Johnson 79b], which scanned a set of files and remarked on dubious constructions.

-- https://www.bell-labs.com/usr/dmr/www/chist.html

msebor · on April 14, 2020

First, there needs to be a proposal for adding a feature (I'm not aware of one having been submitted recently). Second, any non-trivial proposed feature needs to have some existing user experience behind it. For libraries that typically means implementations shipping with operating systems or compilers (but successful third party libraries might also be considered). Finally, it also needs to appeal to people on the committee; that can be quite challenging as well. Many proposals that meet the first two criteria die because they simply don't get enough support within the committee.

Daemon404 · on April 14, 2020

Sounds mostly like the issue is nobody has bothered to submit a proposal for it then? (There is so much in-the-wild experience and code dealing with this issue, I cannot imagine the second point being problematic.)

On the third point, I have trouble thinking of any technical objections to such proposal.

msebor · on April 14, 2020

This is a good example. Let me flesh it out a bit more to illustrate a specific instance of this problem:

  int a[2][2];
  int f (int i, int j)
   {
       int t = a[1][j];
       a[0][i] = 0;          // cannot change a[1]
       return a[1][j] - t;   // can be folded to zero
   }

The language says that elements of the matrix a must only be accessed by indices that are valid for each bound, so compilers can and some do optimize code based on that requirement (see https://godbolt.org/z/spSF8e).

But when a program breaks that requirement (say, by calling f(2, 0)) the function will likely return an unexpected value.

Spivak · on April 14, 2020

But I don't know what you want to happen in this case? If you actually call f(2,0) then the program makes no sense. How can you have an expected value for a function call that violates its preconditions?

userbinator · on April 14, 2020

Based on the memory layout of arrays, which AFAIK is defined rather strictly by the standard, a[0][2] will be the same as a[1][0].

msebor · on April 14, 2020

This is a common misconception (or poor way of phrasing it, sorry). Compiler implementers don't go looking for instances of undefined behavior in a program with the goal of optimizing it in some way. There is little value in optimizing invalid code. The opposite is the case.

But we must write code that relies on the same rules and requirements that programs are held to (and vice versa). When either party breaks those rules, either accidentally or deliberately, bad things happen.

What sometimes happens is that code written years or decades ago relies on the absence of an explicit guarantee in the language suddenly stops working because a compiler change depends on the assumption that code doesn't rely on the absence of the guarantee. That can happen as a result of improving optimizations, which is often but not not necessarily always motivated by improving the efficiency of programs. Better analysis can also help find bugs in code or avoid issuing warnings for safe code.

flatfinger · on April 15, 2020

The fact that the Standard does not impose requirements upon how a piece of code behaves implies that the code is not strictly conforming, but the notion that it is "invalid" runs directly contrary to the intentions of the C89 and C99 Standards Committees, as documented in the published C99 Rationale. That document recognizes Undefined Behavior as, among other things, "identifying avenues of conforming language extension". Code that relies upon such extensions may be non-portable, but the authors of the Standard have expressly said that they did not wish to demean useful programs that happen to be non-portable.

ori_b · on April 14, 2020

There are rules and requirements documented in the spec, and there are de-facto rules and requirements that programs expect. Not only that, but when they do exploit these rules, often the code generated is obviously incorrect, and could have been flagged at compile time.

Right now, it seems like compiler vendors are playing a game of chicken with their users.

saagarjha · on April 14, 2020

I think the issue is that many of these "obviously incorrect" things are not obvious at the level that the optimizations are taking place. Perhaps it would be worth considering adding higher-level passes in the compiler that can detect these kinds of surprising changes and warn about them.

a1369209993 · on April 15, 2020

Well, no, the issue is that the compiler writers refuse to acknowledge the these obviously incorrect things are incorrect in the first place and tend to blame users for tripping over compiler bugs. If it were just that they didn't know how to fix said bugs, that would be a qualitatively different and much less severe problem.

mpweiher · on April 15, 2020

> not obvious at the level that the optimizations are taking place

Hmm...then it's up to the optimisers to up their game.

Optimisation is supposed to be behaviour-preserving. Arguing that almost all real-world programs invoke UB and therefore don't have well-defined behaviour (by the standard as currently interpreted) is a bit of a cop-out.

cwzwarich · on April 14, 2020

> This is a common misconception (or poor way of phrasing it, sorry). Compiler implementers don't go looking for instances of undefined behavior in a program with the goal of optimizing it in some way. There is little value in optimizing invalid code. The opposite is the case.

Compilers do deliberately look to optimize loops with signed counters by exploiting UB to assume that they will never wrap.

qznc · on April 14, 2020

I'd say both statements are correct.

Compiler implementers are happy when they don't have to care about some edge case because then the code is simpler. Thus, only for unsigned counters there is the extra logic to compile them correctly.

That is my interpretation of "The opposite is the case". Writing a compiler is easier with lots of undefined behavior.

ender341341 · on April 15, 2020

But that's backwards, the compiler writers are writing special cases to erase checks in the signed case. Doing the 'dumb' thing and mindlessly going through the written check is simpler which is why that's what compilers did for decades as de facto standard on x86.

qznc · on April 15, 2020

The dump thing is a non optimizing compiler. GCC and LLVM contain many optimization phases. It is probably some normal optimization which is only "wrong" in the context of loop conditions.

Leherenn · on April 14, 2020

Well yes, they assume they never wrap because that is not allowed by the language, by definition. UB are the results of broken preconditions at the language level.

GoblinSlayer · on April 15, 2020

Terminology can go either way, but is it such a good idea what gcc actually does?

msebor · on April 14, 2020

Yes, it's undefined. It involves a read of an uninitialized local variable. Except for the special case of unsigned char, any uninitialized read is undefined.

emilfihlman · on April 14, 2020

>Except for the special case of unsigned char, any uninitialized read is undefined.

Could you expand on this?

msebor · on April 14, 2020

An object of any type, initialized or not, can be read by an lvalue of unsigned char (or any character type). That lets functions like memcpy (either the standard one or a hand-rolled loop) copy arbitrary chunks of memory.

There's some debate about the effects of reading an uninitialized local variable of unsigned char (like whether the same value must be read each time, or whether it's okay for each read to yield a different value).

This special exemption doesn't extend to any other types, regardless of whether or not they have padding bits or trap representations that could cause the read to trap. Few types do, yet the behavior of uninitialized reads in existing implementations is demonstrably undefined (inconsistent or contradictory to invariants expressed in the code of a test case), so any subtleties one might derive from the text of the standard must be viewed in that light.

MaxBarraclough · on April 14, 2020

Thanks for your answers. A related question: this article [0] appears to single out memcpy and memmove as being special regarding effective type. Is it accurate? It seems to be at odds with your suggestion that there's nothing stopping me writing my own memcpy provided I'm careful to use the right types.

[0] https://en.cppreference.com/w/c/language/object#Effective_ty...

AaronBallman · on April 14, 2020

I think that may be inaccurate -- IIRC, in C, you can do type punning via a union but not memcpy, and in C++ you can do type punning via memcpy but not a union and this incompatibility drives me nuts because it makes inline functions in a header file shared between C and C++ really messy. (Moral of the story: don't pun types.)

pascal_cuoq · on April 14, 2020

The C standard also allows to use memcpy to do type punning:

    If a value is copied into an object having no declared type using memcpy or memmove,
    or is copied as an array of character type, then the effective type of the modified
    object for that access and for subsequent accesses that do not modify the value is
    the effective type of the object from which the value is copied, if it has one

Simply memcpy into a variable (as opposed to dynamically allocated memory).

https://port70.net/~nsz/c/c11/n1570.html#6.5p6

AaronBallman · on April 14, 2020

I must be remembering incorrectly then, thank you!

msebor · on April 14, 2020

memcpy and memmove aren't special. The part that discusses the copying of allocated objects is 6.5, p6, quoted below:

The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is opied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

MaxBarraclough · on April 14, 2020

I see, so in short the article is failing to reflect this excerpt: or is copied as an array of character type. Thanks again.

flatfinger · on April 17, 2020

Has there ever been any consensus as to what that "...or is copied as an array of character type..." text is supposed to mean, or what sort of hoops must be jumped through for a strictly conforming program to generate an object whose bit pattern matches another without copying the effective type thereof?

rseacord · on April 14, 2020

Uninitialized Reads https://queue.acm.org/detail.cfm?id=3041020

loeg · on April 14, 2020

I'm guessing you were asking about this part rather than UB in general:

> Except for the special case of unsigned char,

The SO article makes the bizarre claim that because

(1) an unsigned char, per the standard, cannot have any padding bits, it therefore cannot have a trap representation. And

(2) if it cannot have a trap representation, the use of an uninitialized value isn't undefined.

I'm willing to buy (1) but I don't remember (2) being required for UB. I think (2) is the step that is harder to follow intuitively. Admittedly, I have not read that part of the standard closely in some time.

msebor · on April 14, 2020

Most of us on the committee would like to see more participation from other experts. The committee's mailing list should be open even to non-members. Attendance by non-members at meetings might require an informal invitation (I imagine a heads up to the convener should do it).

DougGwyn · on April 14, 2020

I think that's right. These days, much of the discussion occurs through study subgroups (like the floating-point guys) and the committee e-mailing list.