Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a really neat article. One thing: the author falls victim to a common, unfortunate mistake in calculating the percentage gains: ...120 cycles. That’s a 30-percent speed-up. and then ...98 cycles. Compared to the original code, that’s 60 percent faster.

The right way to calculate this figure is (t1 - t0)/t0, rather than the author's formula which seems to be (t1 - t0)/t1. For instance: (157 - 98)/98 = 60%, but the actual amount is (157 - 98)/157 = a 38% speed up. A heuristic: 60% of 157 will be much more than 60 (since 60% of 100 = 60), which means a 60% speed up would reduce the speed to below 97 cycles.

It gets even more misleading the more efficient it gets: Adding up the cycles, the total was just 1689. I had shaved almost 1200 cycles off of my friend’s code. That’s a 70 percent speed-up! The author has 1200/1689 = 71%, but the correct numbers yield 1200/(1689+1200) = 42%.

Not that I don't think these are significant gains, but it's just misleading to label them like this. If you've removed less than half the cycles, there's no way you've seen a 70% speed up.



Thanks for your feedback. (Author here.)

By my calculations (which I'll machine-check now with Maxima), the 30% and 70% speed-up claims are sound. First, let's fire up Maxima:

    $ maxima
    Maxima 5.30.0 http://maxima.sourceforge.net
    using Lisp SBCL 1.1.8-2.fc19
    Distributed under the GNU Public License. See the file COPYING.
    Dedicated to the memory of William Schelter.
    The function bug_report() provides bug reporting information.
Now let's calculate how much faster we made the row-copy code by unrolling its loop:

    (%i1) orig_loop_speed:     (1*row) / (157*cycle)$
    (%i2) unrolled_loop_speed: (1*row) / (120*cycle)$
    (%i3) unrolling_speedup:   unrolled_loop_speed / orig_loop_speed,  numer;
    (%o3)                          1.308333333333333
In other words, the unrolled-loop speed is 1.3 times the original-loop speed. That's 30% faster, right?

Likewise, how much faster did shaving those 1200 cycles make the tile-copy code?

    (%i4) copy2_speed:        (1*tile) / (2893*cycle)$
    (%i5) copy3_speed:        (1*tile) / (1689*cycle)$
    (%i6) copy3_vs_2_speedup: copy3_speed / copy2_speed,  numer;
    (%o6)                          1.712847838957963
I'm willing to believe that I've screwed up here, but I can't see where. Can you show me?

Thanks for your help!


Your only "mistake" was to switch to a different measure when giving the percentages. Your calculations are actually fine, because you explicitly label them "speedup".

However, this can confuse a non-thorough reader because your previous measures are not about "speed"; they are about "cost" (cycles). The parent just did not realize the switch, and thought that you were claiming to have reduced "cost" (cycles) by 30%/70%, which wouldn't be true.


Thanks for this insight! I never would have seen it had you not told me. (I'm an engineering guy and switching between time-per-unit and units-per-time is so automatic for me that I didn't realize it wasn't automatic for everybody.)

I've updated the story to explain the calculation the first time it happens.

Thanks again!


As a fellow Maxima user, thanks for showing me "expr, numer". I've always used "float(expr)" for that, but I like yours better.


I'm having trouble with your post. You said the right way to calculate it is (t1 - t0)/t0, but said later that the actual amount is (157 - 98)/157, which is (t1 - t0)/t1.

Also I believe the author is referring to speed improvements, whereas you are referring to time improvements. Let's say I take 200 operations to perform a task, then I optimize code to perform the same task in 100 operations. The program runs at 20 operations per second regardless of how many operations there are.

The task which took 10 seconds before will now take 5 seconds, or a 50% time reduction, which is how you calculate it. However, the program is now running twice as fast, because in 20 seconds, the program can run twice, which is how the author calculated it.

Both of you are correct, just depending on what your context is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: