At a first glance, it seems the compiler version is better at hiding the latency of some of the div instructions. It might be hiding memory access latency, too. But that's more involved analysis.
At a first glance, it seems the compiler version is better at hiding the latency of some of the div instructions. It might be hiding memory access latency, too. But that's more involved analysis.