wc seems like a bad example because it’s basically IO bound. The title should re...

microtonal · on Nov 20, 2019

No, it's not faster. They are comparing a Go version that does not do character decoding (which is necessary for correctly counting the number of words under the presence of non-ASCII punctuation) to a C version that does decode characters (and matches them against a larger character set with iswspace).

This can be easily shown, by counting words and lines using wc separately. Word counting decodes characters (to find non-ASCII whitespace that may separate words), whereas line counting just looks for ASCII line separators:

    $ time wc -w wiki-large.txt
    17794000 wiki-large.txt
    wc -w wiki-large.txt  0.48s user 0.02s system 99% cpu 0.496 total
    $ time wc -l wiki-large.txt
    854100 wiki-large.txt
    wc -l wiki-large.txt  0.02s user 0.01s system 99% cpu 0.034 total

So, without character decoding, looking at every byte is ~15 times faster. So, if you'd compile wc without multibyte character support (which would be a fair comparison), it would probably beat Go without any parallelization.

ajeetdsouza · on Nov 20, 2019

This is not true, see my comment here: https://news.ycombinator.com/edit?id=21587907

microtonal · on Nov 21, 2019

Take the Darwin version linked from your site. Run perf record wc thefile.txt. Then run perf report and you will see iswspace in the call graph.

As I show in [1], removing this call and replacing it by a character match, gives a speedup of almost 2x.

[1] https://news.ycombinator.com/item?id=21592089

ajeetdsouza · on Nov 20, 2019

Author here. I have addressed this in the article. The bufio-based implementation was the first one, and it was actually slower.

In the second section, I was able to surpass the performance of the C implementation - by using a read call with a buffer. As I mentioned in the article, the C implementation does the same, and in the interest of a fair benchmark, I set equal buffer sizes for both.

esmi · on Nov 20, 2019

I don't know Go so I am not sure what file.Read does exactly but wc, on my system which uses the same wc.c as Jenner's article, is doing blocking reads in a naive way which I argue makes it somewhat of a paper target.

Maybe the Linux wc version you have is better. I don't know. They do exist and Penner's article gave a link to one. I am not sure as you didn't link to the source of your wc. (Edit: I just noticed you did use the OSX wc, you're just running it on Fedora. Sorry about that.)

But in any case using just wall clock time can be deceptive. Here is what I mean. On my machine, a somewhat old Mac Mini with a spinning disk, user CPU is only ~1/5 of the real time. The rest waiting for the OS or the disk.

  % for ((i=0;i<250;i++)); do cat /usr/share/dict/words >> a; done
  % /usr/bin/time wc -l a                                         
   58971500 a
          1.24 real         0.28 user         0.64 sys