Thanks, I finally understood what you are saying. Indeed, the code uses iswspace...

microtonal · on Nov 21, 2019

Thanks, I finally understood what you are saying.

I am sorry for the unclear comments. I'll stop commenting on a phone ;).

Indeed, the code uses iswspace to test all characters, wide or normal. Strange design choice.

I agree, it's really strange. This seems to be inherited by the FreeBSD version, which still does that as well:

https://github.com/freebsd/freebsd/blob/8f9d69492c3da3a8c1ea...

It has the worst of both worlds: it incorrectly counts the number of words when there is non-ASCII whitespace (since mbrtowc is not used), but it pays the penalty of using iswspace. It's also not in correspondence with POSIX, which states:

The wc utility shall consider a word to be a non-zero-length string of characters delimited by white space.

[...]

C_CTYPE

Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and input files) and which characters are defined as white space characters.