I wouldn't say "nobody" wants to see the XOR-trick solution.
Its just that those who DO care about it is asking about the general solution: Reed-Solomon codes, or maybe a more modern (harder to understand) variant: like LDPC or Tornado codes.
Anyone who needs to recover *ONE* symbol from a data-stream with noise actually needs to recover two, three... four... symbols in the general case.
One symbol of erasure recovery is the weakest-of-the-weakest of error correction codes. Its even weaker than the single-error-correction / double-error detection Hamming Code (discovered back in 1950s).
XOR-trick is your "basic parity bit", and is the stuff taught to undergrads to wet your appetite for error correction codes (well... erasure correction in this case. Since XOR isn't strong enough to actually correct an error. Only an erasure).
I don't see how the problem corresponds to erasure coding.
Typically in erasure coding you know the locations of the erasures.
Here, you don't know the location.
Maybe you can elaborate on how it relates to erasure coding?
As long as a^1, a^2, a^3... are distinct, then this matrix is invertible (aka: all rows / columns are linearly independent). In a GF(2^8) field, there are 2^8-1 distinct values (the values 0x01 through 0xFF), making a 255x255 matrix. The polynomial "x" (aka: 0x02) is often chosen to be the value of "a", but it can be any primitive element that loops around all 255 non-zero numbers and still work.
This Vandemonde Matrix has a very simple construction, but it is non-systematic (the data is "mangled" in encoded form, so we need to invert the matrix to decode). However, this non-systematic form is easier to see some patterns. Now lets take the 1st column:
1
1
1
1
1
...
Hmmm... look familiar? What happens when we multiply the data vector with this column?? It becomes a bit more obvious:
1
1
1
[d0 d1 d2 d3...] * 1
1
1
1
1
1
Remember, in Reed-Solomon, you perform operations over GF(2^blah). So all multiplications are GF-multiplications. But these are all multiplications by 1, so we can just ignore the complications of GF-multiplication entirely. (Even in GF(): a multiplication by 1 is just the identity).
To finish the matrix-multiply, we add everything together. But we do GF-addition (not regular addition). A GF-addition is also known as XOR. So the ultimate answer is:
d0 XOR d1 XOR d2 ...
Which so happens to be the first parity bit of the non-systematic Reed Solomon code. As such, we've proven the relationship between the "XOR trick" and Reed Solomon (Vandemonde construction).
------------
The hamming-distance between codes is 1. We can correct floor(1/2) errors (aka 0 errors) and 1 erasure. As such, this "all 1s matrix" is a 1-erasure punctured Reed Solomon code.
I'm familiar with some subset of coding and number theory, so you can assume at least some more knowledge (Galois Fields, or basics of RS codes for example).
Small nitpick: The Vandermonde matrix actually isn't square. It's a (k × n) matrix, (where typically k ≠ n). Therefore, it can't be invertible.
I see that many codes contain a parity bit (for example the extended Golay code), however I don't see how the operation of recovering "the missing number" could be implemented in terms of a code and its encoding/decoding algorithms.
It describes a way to recover the missing number by constructing a Vandermonde matrix using the given numbers. This would correspond to constructing a generator matrix of a specific RS code [1].
However, after this I'm not so sure about the relation between encoding/decoding and recovering the missing number.
In the end they also factor a polynomial (that could be something analogous to the error locator polynomial?).
[1]: although I'm not sure if it's strictly an RS code
Note: Backblaze here uses "vertical data" (G * data) instead of what I did earlier "horizontal data" in the form of (data*G). But otherwise, still a good blogpost.
------------
So if you have 5 data + 1 parity, you now have a 5x6 generator matrix, giving you one extra column for erasures.
If one column is erased, you replace the erased column with the parity column, creating a 5x5 matrix.
As long as all columns were linearly independent, the resulting 5x5 matrix remains invertable.
The specific matrix multiplications / inversions / operations are well documented in that blogpost.
-----------
EDIT: Erasure decoding is much easier than error decoding. Error decoding requires figuring out the "locator", and then applying the calculated errors at those locations. Since we already know the locations in an "erasure" situation, we can just manipulate the matrix in an obvious manner.
--------
EDIT2: Ah right, the "systematic form" of the Vandemonde-Reed Solomon construction is to perform Gaussian Elimination on the non-systematic matrix (with the goal of forming an identity-sub-matrix). After gaussian elimination, the "column of ones" (or the simple XOR) disappears. (Erm... row of ones in the Backblaze pictures)
So maybe the Golay code you're thinking of is a better example as a matrix with an explicit parity bit.
I would be interested in a way based on coding theory that solves the problem in the blog-post.
Something of a form similar to this one would be a solution to me:
def find_missing_number(nums: List[int]):
# 1. Define some code C (possibly using nums)
# 2. Encode a message using C (or interpret nums as a word or codeword)
# 3. Do something with the word
# 4. Find the locations of the errors in the word
# 5. The locations of the errors tell us the missing number(s)
The answer on stackoverflow seems to use methods very related to RS decoding, however I can't quite squint enough at it to see, whether it could actually be solved using the exact same methods in RS decoding.
Its just that those who DO care about it is asking about the general solution: Reed-Solomon codes, or maybe a more modern (harder to understand) variant: like LDPC or Tornado codes.
Anyone who needs to recover *ONE* symbol from a data-stream with noise actually needs to recover two, three... four... symbols in the general case.
One symbol of erasure recovery is the weakest-of-the-weakest of error correction codes. Its even weaker than the single-error-correction / double-error detection Hamming Code (discovered back in 1950s).
XOR-trick is your "basic parity bit", and is the stuff taught to undergrads to wet your appetite for error correction codes (well... erasure correction in this case. Since XOR isn't strong enough to actually correct an error. Only an erasure).