Hacker Newsnew | past | comments | ask | show | jobs | submit | msebor's commentslogin

I'd expect a proposal for (1) to be well received. The only proposal I recall that deals with (2) is http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2067.pdf. I think it's still being discussed. (3) is highly unlikely if it involved ABI changes. Even if it could be done without such changes unless there is a precedent for it in an existing compiler (and preferably more), it would likely be a tough sell.


Is the linked proposal really dealing with unnamed struct types? I skimmed it and it seems like it is dealing with named constants. Also, is there a proposal for (1) currently, or is someone planning on writing one? Regarding (3), yes, this one was mostly wishful thinking.


WG14 in general looks favorably at proposals to align C more closely with C++ (within the overall spirit of the language) and I'd expect (1) would viewed in that light.

I'd also say there is consensus that (2) would be beneficial. There are some good ideas in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2067.pdf although I don't think repurposing the register keyword for it was very popular. Not just because it wouldn't be compatible with C++ which deprecated register some time ago, but also because it's novel with no implementation or user experience behind it. My impression that this is waiting for a new proposal.


Several of us discussed typeof and I'd expect a proposal for a feature along these lines to be well received. (I recall someone even saying they're working on one but that shouldn't stop anyone from submitting one of their own.)


I'm glad to hear that.

What about statement expressions? They're quite useful, and supported by multiple independent compilers.


I'm not aware of recent proposals for those but we have discussed ideas along those lines (closures: N2030, C++ lambdas, Apple Blocks: N1451, and I think there was one from Cilk). I think there was interest but not enough support for the details and likely also concerns from implementers.


There are many improved versions of string APIs out there, too many in fact to choose from, and most suffer from one flaw or another, depending on one's point of view. Most of my recent proposals to incorporate some that do solve some of the most glaring problems and that have been widely available for a decade or more and are even parts of other standards (POSIX) have been rejected by the committee. I think only memccpy and strdup and strdndup were added for C2X. (See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2349.htm for an overview.)


> Most of my recent proposals [...] have been rejected by the committee.

Does anyone have insight on why?


memccpy is a very welcome addition in the front of copying strings; what else were you thinking of proposing?


C17 doesn't look much different than C89. If you are used to K&R C there may be some adjustment but I would expect it to be manageable.

What might perhaps be more challenging is adjusting to the changes in compilers. They tend to optimize code more aggressively and so writing code that closely follows the rules of the language (rather than making assumptions about the underlying hardware, even valid ones) is more important today than it was back in the 80's.


Given the above, it is worth pointing out that the compilers are also much much better in verification and useful warnings/errors. Back in the (very old) days, there was a motivation to cut down PCC (Portable C Compiler) and give the birth to Lint as a separate application (because cutting the compilation time was a greater priority). The current trends are completely the opposite: compilers are getting increasingly more powerful built-in static analyzers and sanitizers by default.

I think the lack of powerful tools in 1990s-2000s contributed to the thought by some that C is 'diffcult' in terms of safety. However, things have moved on.


As additional info,

> Although the first edition of K&R described most of the rules that brought C's type structure to its present form, many programs written in the older, more relaxed style persisted, and so did compilers that tolerated it. To encourage people to pay more attention to the official language rules, to detect legal but suspicious constructions, and to help find interface mismatches undetectable with simple mechanisms for separate compilation, Steve Johnson adapted his pcc compiler to produce lint [Johnson 79b], which scanned a set of files and remarked on dubious constructions.

-- https://www.bell-labs.com/usr/dmr/www/chist.html


First, there needs to be a proposal for adding a feature (I'm not aware of one having been submitted recently). Second, any non-trivial proposed feature needs to have some existing user experience behind it. For libraries that typically means implementations shipping with operating systems or compilers (but successful third party libraries might also be considered). Finally, it also needs to appeal to people on the committee; that can be quite challenging as well. Many proposals that meet the first two criteria die because they simply don't get enough support within the committee.


Sounds mostly like the issue is nobody has bothered to submit a proposal for it then? (There is so much in-the-wild experience and code dealing with this issue, I cannot imagine the second point being problematic.)

On the third point, I have trouble thinking of any technical objections to such proposal.


This is a good example. Let me flesh it out a bit more to illustrate a specific instance of this problem:

  int a[2][2];
  int f (int i, int j)
   {
       int t = a[1][j];
       a[0][i] = 0;          // cannot change a[1]
       return a[1][j] - t;   // can be folded to zero
   }
The language says that elements of the matrix a must only be accessed by indices that are valid for each bound, so compilers can and some do optimize code based on that requirement (see https://godbolt.org/z/spSF8e).

But when a program breaks that requirement (say, by calling f(2, 0)) the function will likely return an unexpected value.


But I don't know what you want to happen in this case? If you actually call f(2,0) then the program makes no sense. How can you have an expected value for a function call that violates its preconditions?


Based on the memory layout of arrays, which AFAIK is defined rather strictly by the standard, a[0][2] will be the same as a[1][0].


This is a common misconception (or poor way of phrasing it, sorry). Compiler implementers don't go looking for instances of undefined behavior in a program with the goal of optimizing it in some way. There is little value in optimizing invalid code. The opposite is the case.

But we must write code that relies on the same rules and requirements that programs are held to (and vice versa). When either party breaks those rules, either accidentally or deliberately, bad things happen.

What sometimes happens is that code written years or decades ago relies on the absence of an explicit guarantee in the language suddenly stops working because a compiler change depends on the assumption that code doesn't rely on the absence of the guarantee. That can happen as a result of improving optimizations, which is often but not not necessarily always motivated by improving the efficiency of programs. Better analysis can also help find bugs in code or avoid issuing warnings for safe code.


The fact that the Standard does not impose requirements upon how a piece of code behaves implies that the code is not strictly conforming, but the notion that it is "invalid" runs directly contrary to the intentions of the C89 and C99 Standards Committees, as documented in the published C99 Rationale. That document recognizes Undefined Behavior as, among other things, "identifying avenues of conforming language extension". Code that relies upon such extensions may be non-portable, but the authors of the Standard have expressly said that they did not wish to demean useful programs that happen to be non-portable.


There are rules and requirements documented in the spec, and there are de-facto rules and requirements that programs expect. Not only that, but when they do exploit these rules, often the code generated is obviously incorrect, and could have been flagged at compile time.

Right now, it seems like compiler vendors are playing a game of chicken with their users.


I think the issue is that many of these "obviously incorrect" things are not obvious at the level that the optimizations are taking place. Perhaps it would be worth considering adding higher-level passes in the compiler that can detect these kinds of surprising changes and warn about them.


Well, no, the issue is that the compiler writers refuse to acknowledge the these obviously incorrect things are incorrect in the first place and tend to blame users for tripping over compiler bugs. If it were just that they didn't know how to fix said bugs, that would be a qualitatively different and much less severe problem.


> not obvious at the level that the optimizations are taking place

Hmm...then it's up to the optimisers to up their game.

Optimisation is supposed to be behaviour-preserving. Arguing that almost all real-world programs invoke UB and therefore don't have well-defined behaviour (by the standard as currently interpreted) is a bit of a cop-out.


> This is a common misconception (or poor way of phrasing it, sorry). Compiler implementers don't go looking for instances of undefined behavior in a program with the goal of optimizing it in some way. There is little value in optimizing invalid code. The opposite is the case.

Compilers do deliberately look to optimize loops with signed counters by exploiting UB to assume that they will never wrap.


I'd say both statements are correct.

Compiler implementers are happy when they don't have to care about some edge case because then the code is simpler. Thus, only for unsigned counters there is the extra logic to compile them correctly.

That is my interpretation of "The opposite is the case". Writing a compiler is easier with lots of undefined behavior.


But that's backwards, the compiler writers are writing special cases to erase checks in the signed case. Doing the 'dumb' thing and mindlessly going through the written check is simpler which is why that's what compilers did for decades as de facto standard on x86.


The dump thing is a non optimizing compiler. GCC and LLVM contain many optimization phases. It is probably some normal optimization which is only "wrong" in the context of loop conditions.


Well yes, they assume they never wrap because that is not allowed by the language, by definition. UB are the results of broken preconditions at the language level.


Terminology can go either way, but is it such a good idea what gcc actually does?


Yes, it's undefined. It involves a read of an uninitialized local variable. Except for the special case of unsigned char, any uninitialized read is undefined.


>Except for the special case of unsigned char, any uninitialized read is undefined.

Could you expand on this?


An object of any type, initialized or not, can be read by an lvalue of unsigned char (or any character type). That lets functions like memcpy (either the standard one or a hand-rolled loop) copy arbitrary chunks of memory.

There's some debate about the effects of reading an uninitialized local variable of unsigned char (like whether the same value must be read each time, or whether it's okay for each read to yield a different value).

This special exemption doesn't extend to any other types, regardless of whether or not they have padding bits or trap representations that could cause the read to trap. Few types do, yet the behavior of uninitialized reads in existing implementations is demonstrably undefined (inconsistent or contradictory to invariants expressed in the code of a test case), so any subtleties one might derive from the text of the standard must be viewed in that light.


Thanks for your answers. A related question: this article [0] appears to single out memcpy and memmove as being special regarding effective type. Is it accurate? It seems to be at odds with your suggestion that there's nothing stopping me writing my own memcpy provided I'm careful to use the right types.

[0] https://en.cppreference.com/w/c/language/object#Effective_ty...


I think that may be inaccurate -- IIRC, in C, you can do type punning via a union but not memcpy, and in C++ you can do type punning via memcpy but not a union and this incompatibility drives me nuts because it makes inline functions in a header file shared between C and C++ really messy. (Moral of the story: don't pun types.)


The C standard also allows to use memcpy to do type punning:

    If a value is copied into an object having no declared type using memcpy or memmove,
    or is copied as an array of character type, then the effective type of the modified
    object for that access and for subsequent accesses that do not modify the value is
    the effective type of the object from which the value is copied, if it has one
Simply memcpy into a variable (as opposed to dynamically allocated memory).

https://port70.net/~nsz/c/c11/n1570.html#6.5p6


I must be remembering incorrectly then, thank you!


memcpy and memmove aren't special. The part that discusses the copying of allocated objects is 6.5, p6, quoted below:

The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is opied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.


I see, so in short the article is failing to reflect this excerpt: or is copied as an array of character type. Thanks again.


Has there ever been any consensus as to what that "...or is copied as an array of character type..." text is supposed to mean, or what sort of hoops must be jumped through for a strictly conforming program to generate an object whose bit pattern matches another without copying the effective type thereof?



I'm guessing you were asking about this part rather than UB in general:

> Except for the special case of unsigned char,

The SO article makes the bizarre claim that because

(1) an unsigned char, per the standard, cannot have any padding bits, it therefore cannot have a trap representation. And

(2) if it cannot have a trap representation, the use of an uninitialized value isn't undefined.

I'm willing to buy (1) but I don't remember (2) being required for UB. I think (2) is the step that is harder to follow intuitively. Admittedly, I have not read that part of the standard closely in some time.


Most of us on the committee would like to see more participation from other experts. The committee's mailing list should be open even to non-members. Attendance by non-members at meetings might require an informal invitation (I imagine a heads up to the convener should do it).


I think that's right. These days, much of the discussion occurs through study subgroups (like the floating-point guys) and the committee e-mailing list.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: