> Hard-to-implement NIST curves suck, whereas GCM and Poly1305 are recommended.
I've always wondered this about DJB - he preaches the gospel of ease-of-implementation with Curve25519 and Salsa/Chacha20, but then for a MAC he has... Poly1305. I guess speed trumps everything?
Sure. I'm not saying Poly1305 is problem for DJB, just for anyone else trying to implement it, which is a concern DJB has with his other crypto, but not here.
GCM is harder for everyone else to implement than Poly1305. It's harder in the "literally trickier to implement" sense, and in the "needs hardware support to be performant and secure at the same time".
> "needs hardware support to be performant and secure at the same time"
So does Poly1305; it just so happens that most popular processors have strong hardware support. Here's an exercise: implement both GHASH and Poly1305 for MSP430.
I think you're calling fast multipliers "hardware support", which is fair, but the hardware support needed by GHASH is idiosyncratic to things like GHASH. CLMUL is only a few years old and GCM is its primary use case.
Implementing a truly constant-time GCM in software without CLMUL is sufficiently hard that noone has managed to create a remotely competitive implementation. They're all either an order of magnitude slower or vulnerable to cache-timing attacks.
Poly1305 isn't a walk in the park, but doesn't need special hardware support for fast constant-time implementation. Though I will agree something like HMAC is much simpler.
I've always wondered this about DJB - he preaches the gospel of ease-of-implementation with Curve25519 and Salsa/Chacha20, but then for a MAC he has... Poly1305. I guess speed trumps everything?