Ah, so it's not so much about adjacency (which would even benefit an implementation that insisted on loading bytes individually) but about the number of operations required to bucket-brigade those zeroes and ones into registerland when the cache issue is solved (which I'd have very much expected to be the case in the baseline of the comparison, just like other replies).
I was close to dismissing your reply as merely a nomenclature nitpick, but I think I have learned something interesting, thanks!
I was close to dismissing your reply as merely a nomenclature nitpick, but I think I have learned something interesting, thanks!