Most things that you do on the computer, you want to be O(n) or less; maybe O(n log n) or even O(n log^x n), but no slower than that. That means if you get a bigger problem, you can generally just get a proportionally bigger computer and be all set.
Now, sure, there are plenty of O(n^2) problems where the n stays small enough, or you don't mind waiting or spending a ton of money on it. But just because something is in P doesn't mean that you want to be solving it with a polynomial time algorithm on a regular basis.
Our CS-Prof also had another interesting point: P = NP could be true without changing many things in reality.
This could occur if the reduction of an NP-complete problem onto a polynomial problem results in a runtime of such monstrous polynomial degree that the exponential algorithms are just faster for every tractable problem size. Something like this exists in some graph algorithms - theoretically faster algorithms exist, but in practice, they are a lot slower than the theoretically slower algorithm until completely silly graph sizes.
This could turn even more frustrating if the proof was nonconstructive.
Another interesting take: P != NP could be true, while changing MANY things in reality.
Basically all modern asymmetric cryptography in common use rely on the difficulty of either integer factoring or discrete logarithms (including discrete logarithms of elliptic curves). The problem is, none of those problems are proven to be NP-complete! Even if we proved P != NP, there could still be polynomial time algorithms for integer factoring and/or discrete logarithms.
Staying in the realm of polynomial complexity matrix multiplication comes to mind, where we are approaching more and more O(n^2), where O(n^3) is the naive, but more common implementation.
In terms of practical algorithms, the Strassen algorithm (O(n^2.8)) is the only one that has runtime advantages for matrix sizes that aren't enormous, and even then, it's not always used because it has two non-trivial costs: reduced numerical stability and more memory space requirements for intermediate results.
It actually doesn't require more memory for intermediate results (see the Strassen reloaded paper). It's more just that it's a ton of work to implement well (even compared to a regular gemm which is already hard), and the benefits only start showing up at pretty large (~4000x4000) matrices.
> This could occur if the reduction of an NP-complete problem onto a polynomial problem results in a runtime of such monstrous polynomial degree that the exponential algorithms are just faster for every tractable problem size.
You don't even need to go that far.
n^8 is smaller than 2^n once you get past 44, still well in the tractable range, but encrypting something against n^8 brute force is a pretty reasonable task.
This is similar to Knuth's reasoning for P=NP. Essentially there might be algorithms for these problems that are simply so complex that we might never know them.
The "accidentally quadratic" blog collected such things. Not sure if it's still being updated since Tumblr's mass exodus (the posts don't seem to show timestamps) https://accidentallyquadratic.tumblr.com
Is there some good intuition why P-complete problems are difficult to parallelize? This is the first I've heard of it (but then again, I'm usually interested in more obscure complexity classes)
This is the first I have heard of it as well (and I have properly studied NP complete in graph theory, so I don't know why this missed my attention). It seems that P-complete are difficult to parallelize by definition as they are the set of tractable problems (as opposed to NP) which don't parallelize well.
"The class P, typically taken to consist of all the "tractable" problems for a sequential computer, contains the class NC, which consists of those problems which can be efficiently solved on a parallel computer. This is because parallel computers can be simulated on a sequential machine. It is not known whether NC = P. In other words, it is not known whether there are any tractable problems that are inherently sequential. " From: https://en.m.wikipedia.org/wiki/P-complete
It's not by definition, there's a bit of math behind it! :)
The intuition is that the hardest problems in P have linear dependency chains.
If you could have a good parallel algorithm for a P complete problem, you could take a problem with a dependency chain and solve parts of it in parallel without waiting for the results those parts are depending on.
It’s more like : you win a Turing award by finding a strategy to parallelize this problem as you’ll be able to use that approach to parallelize all problems in P, proving NC = P.
This applies both ways. You'll win a Turing award if you prove NC ≠ P, which is kind of what you said — at least, that the best way I see of reading your first and a few following messages.
Not in the sense you mean. I think the other comment is talking about multiplication without carry, which is a simple form of hashing, but no one I'm aware of uses this for numerical computations. However, floating-point multiplication is inherently an approximation, and the precision of the input is first limited to the bit depth of the sensor channel no matter what, and then typically reduced further anyway by norming everything to fit between -1 and 1 in order not to overweight the importance of input features with naturally larger values as well as to just fit into f16 registers that a typical GPU might have tens of thousands of. Plus, while I don't know what they're doing these days with vector embedding in LLMs, with older school NLP, the probabilities you're dealing with are so small that the only way to reliably get joint distributions is to take the log and add instead of multiply. Otherwise, you'd be very quickly rounding to 0 in what can fit into any floating-point width.
Something to keep in mind is a lot of these matrices are sparse, though. When most of the entries are 0, specialized data structures that know this can avoid doing all of the pointless multiplication by 0 operations. This saves far more time than some kind of approximate multiplication would.
Strassen algorithm drops it to O(n^log2(7)) = 2.8074 because it divides the process into 2x2 matrix multiplications that can be multiplied by 7 ops instead of 8. Strassen is practical algorithm.
You can still optimize and get to 2.371552 with complex trickery, but these are galactic algorithms, meaning that matrices are so big that you can't even construct them on Earth.
Most things that you do on the computer, you want to be O(n) or less; maybe O(n log n) or even O(n log^x n), but no slower than that. That means if you get a bigger problem, you can generally just get a proportionally bigger computer and be all set.
Now, sure, there are plenty of O(n^2) problems where the n stays small enough, or you don't mind waiting or spending a ton of money on it. But just because something is in P doesn't mean that you want to be solving it with a polynomial time algorithm on a regular basis.