It's just that the hardware threads are slow, I think --- compiling stuff on the T2K also took forever. It also seems to me that there's seemingly little value in having 4 threads per core: it forces you to parallelize programs that ran fine with normal cores just to match the performance you'd get without hardware threads...