The problem with x86 in particular is that there is tons of cruft. You can get l...

elcritch · on June 23, 2019

I’ve wondered why there aren’t more tools for predicting how a program fits into cache lines and data caching effects. For given cpu parameters it seems a reasonable task to estimate cache lines based on a sample dataset. Am I just missing what tools are used out there?

CoolGuySteve · on June 24, 2019

The best tool for this in my experience is callgrind with assembly notation. You can configure it to more or less mimic the cache layout of whatever particular chip you're running and then execute your code on it.

You can use the start and stop macros in valgrind.h to show cache behaviour of a specific chain of function calls, like when a network event happens, then in the view menu of kcachegrind select IL Fetch Misses, and show the hierarchical function view.

It doesn't mimic the exact branch prediction or whatever of your architecture but when you compare it to actual timings it's damn close.

elcritch · on June 25, 2019

Wow, that's cool!

MuffinFlavored · on June 23, 2019

Why not just write the function in ASM in the first place?

CoolGuySteve · on June 24, 2019

1) Because the compiler gives you a clear reference implementation to test against for correctness and performance.

2) Because after you do this enough times, you will learn when to write your own, when not to, and when to spot inefficiencies in the compiler output. The point is to learn, both about how the instructions work and how the compiler works.

3) The C/C++ implementation serves as documentation of intent and is portable across architectures (including future x86-64 architectures). It's fucking atrocious when devs write pure assembly without a C/C++ reference that can replace it. To me, finding random assembly without a code implementation in the project is the ultimate indictment of a hot rod programmer not thinking about the future or future maintainers.

bitcoinmoney · on June 24, 2019

Can you talk about your day job?