This always seemed magical to me too, but Gemini recently gave me an explanation...

This always seemed magical to me too, but Gemini recently gave me an explanation that clicked.

The seemingly magical part is that it seems like you "lose" information during the transformation. After all if I give you the sorted string, it's not recoverable. But the key is that the the first and last columns of the BWT matrix end up being the lookup table, from any character to the preceding character. And of course given the BWT encoded string, you can get back the sorted string which is the first column. And with this info of which character precedes which character, it shouldn't be too magical that you can work backwards to reconstruct the original string.

Still, the really clever part is that the BWT encoded string embeds the decoding key directly into the character and positional information. And it just so happens that the way it does this captures character pair dependencies in a way that makes it a good preprocessor for "symbol at a time" encoders.

https://news.ycombinator.com/item?id=35738839 has a nice pithy way of phrasing it

>It sorts letters in a text by their [surrounding] context, so that letters with similar context appear close to each other. In a strange vaguely DFT-like way, it switches long-distance and short-distance patterns around. The result is, in a typical text, long runs of the same letter, which can be easily compressed.