I’ve enjoyed starting to work through the exercises on Cryptopals[1]. It teaches you the underlying concepts for different ways of encrypting information and asks you to write code to break them.
Cryptopals was by far the most fun and the most educational set of puzzles I've worked on. It was also a great way to spend my unemployed Covid summer, right after my startup went bust.
I always wonder how people come up with the designs for various ciphers. When I took crypto class in uni, it all seems like different math/bit/word operations get randomly mashed up and repeated for several rounds. Is there a good resource for learning how to design ciphers?
Im not an expert, but i think it comes down to knowing how to break ciphers, and not doing any of those things.
Most symmetric block ciphers involve something along the lines of mixing in the (sub) key, some linear permutations, some non-linear permutations, then repeat all 3 in many "rounds"
I've often thought that alien civilizations would also probably invent asymmetric cryptography pretty much the same as us: Lattices, Diffie-Hellman, RSA, etc. Those are all based on very pure mathematical problems. Whereas symmetric cryptography, while it all does need some common themes like diffusion, there's no way an alien civilization would design something that looks close to AES or SHA2. They'd have symmetric ciphers and hashes, but they'd look quite different I think. I'd love to be convinced otherwise on this.
The set of operations you have access to as a designer affects immensely what your cipher will look like. If your alien CPUs had a very different instruction set than ours, their ciphers would look very different.
Back in the old days memory lookups were as fast as computation, and S-box based designs were very popular. That is no longer the case, to an extent, both for security (cache-timing side-channels) and performance reasons. S-box designs are still common, but the S-boxes are usually <= 4-bit wide, mostly there to facilitate analysis (counting active S-boxes), and usually implemented as boolean logic instead.
Without S-boxes, the other main approach is to mix operations from incompatible algebraic domains. Like add and xor. When composed many times together, hopefully this results in a very complicated nonlinear expression of very high degree on any of these domains. One of the first popular ciphers to do this was IDEA, which mixed addition, xor, and modular multiplication to pretty good effect. The challenge then is to figure out a set of these operations that is both efficient at eliminating input-output structure (linear, differential, etc characteristics) and efficient at being computed in the widest possible range of machines. This restricts your options to a common set of operations, like add, xor, shift, and so on. Multiplications can be useful, but they don't do very well at the low end, and tend to complicate analysis.
This is only at the very lowest level of the design phase, where you're picking your mixing/diffusion building blocks. You still have to decide on a higher-level structure such as the various Feistel variants, Lai-Massey, SPN, etc, which comes with its own set of tradeoffs.
Yeah, I could believe aliens would have ciphers that mix add, modulo, xor, etc. in some way, but the choices we use in designs do seem to be somewhat random among a large set of possible decisions (though I could be wrong, and it would be really cool to know if there was a "natural" cipher that ought to be universally "discovered", based on how physics, mathematics and computation generally works in this universe). Or at least a class of ciphers where some decisions like constants can be arbitrary without affecting security or efficiency.
Take the polynomial in GCM mode. I'm no expert on this, but I believe it's chosen to be somewhat "awkward", but I'm not sure if it was really the only choice out there. SHA1 I think, and other constructions, sometimes have "nothing up my sleeve" numbers in places like the digits the square roots of small integers and whatnot, but they're just arbitrarily chosen so as not to be suspicious to others. But those are just constants. What about designs? E.g. AES rounds: if you still did the operations or rounds but in a slightly different order, or added/removed some other operation somewhere, I imagine there are many combinations where this will be just fine. I really have no idea if it's chosen completely optimally. It certainly seems different to RSA and Diffie-Hellman which, aside from the padding perhaps, I think are naturally destined to be discovered due to their close relation to group theory and primes.
GCM is a different type of construction, but the polynomial used there is explainable---it's the irreducible polynomial of the type x^128 + f(x) for which f(x) has the lowest degree (i.e., it's the lexicographically smallest irreducible GF(2) polynomial of degree 128). Any irreducible polynomial picked here would work as well, from a security point of view.
AES is a good case study, because you _cannot_ remove any operation without ending up with a broken cipher:
- If you remove AddRoundKey, there's no key getting mixed and you don't really have a block cipher in the first place;
- If you remove SubBytes, the entire thing becomes linear and elementary to break;
- If you remove ShiftRows, each column of the 4x4 state is independent of each other, and you could trivially distinguish the cipher from random by observing that only 32 bits change for any single byte change;
- If you remove MixColumns, you're removing most of the diffusion of the cipher, and once again it becomes much easier to break because there's no longer any mixing across bytes.
The order itself doesn't matter all that much. You can reorder AddRoundKey, ShiftRows, and Mixcolumns any way you like and you end up with the exact same cipher, since those are all linear operations. Ordering them differently with respect to SubBytes would have no meaningful security impact, but it serves no purpose to start with anything other than adding the key and applying SubBytes, since all the other linear stuff can be trivially canceled out by the attacker until the first nonlinear operation.
The general principle in AES-like ciphers is to iterate rounds of the shape `block = nonlinear_operation(block, key[i])` followed by `block = linear_operation(block)`. You will find many (many!) designs like this if you go looking, trying various choices of nonlinear and linear operations that guarantee various useful properties. In the case of AES, SubBytes guarantees that the differential probability going through any byte is upper bounded by 1/64; MixColumns guarantees that for any two distinct 4-byte inputs, between the input and output there are at the very least 5 changed bytes. When you mix these two together, with the help of ShiftRows you can show that useful differential trails are pretty much gone after 4 rounds.
Constructions like SHA-1 (or more precisely, the block cipher underlying SHA-1) do not generally have such clean rationales. You can view it as a Feistel-like cipher with a linear key schedule, so it's hardly a random mashing of operations, but how the particular sequence of operations of the round function is chosen is a bit of a trial and error process. The same applies to Salsa20, Chacha20, and most other ARX constructions.
Arguably the entire reason that Rijndael won the AES competition is because it isn't "randomly mashed up". Each of the individual steps in AES is (relatively) simple and has an explainable design motivation.
Unfortunately I don't have any good advice on where to read up on designing ciphers, as I learned this in a uni course on symmetric cryptography.
It is also very educative to read the AES selection methodology for the best symmetric encryption algorithm. Multiple criteria were used to find a good compromise between security margin and implementation convenience. What is also a curious thing is that Rijndael was not the algorithm that provided the largest security margin, Serpent was yet it didn't get selected.
I think one conclusion the cryptographic community took from previous issues is that the best way to get a solid cipher is a group process.
Even highly skilled cryptographers can fail. Ron Rivest designed RC4, and I don't think anyone would claim that Ron Rivest is not a good cryptographer (he's the R of RSA). But RC4 was not good.
What has been done a number of times now and is currently happening with pqcrypto: You ask everyone in the crypto community to come up with proposals. Then you let them try to find weaknesses in each other's proposal. Then you gradually remove the ones that are considered problematic for any reason.
While you can argue whether this process is perfect (I think some people would argue either serpent or twofish should've won the AES competition), it has not produced any major failures.
This is sort of more interesting because modern ciphers are so much better than older ones. Between Caesar, Enigma, and AES, the growth in robustness seems exponential.
> it all seems like different math/bit/word operations get randomly mashed up and repeated for several rounds
Honestly think you have it down. /s
Need to add some S-Boxes, P-Boxes and SP-Boxes though. The post-quantum stuff people are proposing now gets a bit hairy mathematically but lattices and their hardness properties are fairly easy to understand.
Gimli is a cool place to start for anyone wanting to dip their toes in the water.
It's a permutation in 30 lines or so. Building a cryptographic hash out of it is another 30 sloc.
Somewhere on Reddit there's a great Mike Hamburg (I think?) comment about his approach to quickly analyzing new cipher designs --- is it linear? Can a SAT solver break it? There's more to it, but not much more, and it's one of the denser crypto comments I've read.
Shameless plug: I built a tool that's designed to automatically crack simple ciphers, encodings, hashes (soon™), esolangs and more :)
https://github.com/ciphey/ciphey
Coincidentally I was reminded today that some students are using something I made a while back to learn about breaking AES: https://davidwong.fr/blockbreakers/
[1] https://cryptopals.com/