You're confusing yourself w/ fancy words like "proof space". The LLM is not doing any kind of traversal in any meaningful sense of the word b/c the "proof" is often just grammatically coherent gibberish whereas an actual traversal in an actual space of proofs would never land on incorrect proofs.
My reading of their comment is that a proof space is a concept where a human guesses that a proof of some form q exists, and the AI searches a space S(q) where most points may be not valid proofs, but if there is a valid proof, it will hopefully be found.
So it is not a space of proofs in the sense that everything in a vector space is a vector. More like a space of sequences of statements, which have some particular pattern, and one of which might be a proof.
So it's not a proof space then. It's some computable graph where the edges are defined by standard autoregressive LLM single step execution & some of the vertices can be interpreted by theorem provers like Lean, Agda, Isabelle/HOL, Rocq, etc. That's still not any kind of space of proofs. Actually specifying the real logic of what is going on is much less confusing & does not lead readers astray w/ vague terms like proof spaces.
I still don't get how achieving 96% on some benchmark means it's a super genius but that last 4% is somehow still out of reach. The people who constantly compare robots to people should really ponder how a person who manages to achieve 90% on some advanced math benchmark still misses that last 10% somehow.
This feels like a maybe interesting position, but I don’t really follow what you mean. Is it possible to just state it directly? Asking us to ponder is sort of vague.
These math LLMs seem very different from humans. A person has a specialty. A LLM that was as skilled as, say, a middling PhD recipient (not superhuman), but also was that skilled in literally every field, maybe somebody could argue that’s superhuman (“smarter” than any one human). By this standard a room full of people or an academic journal could also be seen as superhuman. Which is not unreasonable, communication is our superpower.
Yeah - it's interesting where the edge is. In theory, an llm trained in everything should be more ready to make cross-field connections. But doing that well requires certain kind of translation and problem selection work which is hard even for humans. (I would even say, beyond PhD level - knowing which problem is with throwing PhD students at is the domain of professors... And many of them are bad at it, as well.)
On the human side, mathematical silos reduce our ability to notice opportunities for cross-silo applications. There should be lots of opportunity available.
LLM are good at search, but plagiarism is not "AI".
Leonhard Euler discovered many things by simply trying proofs everyone knew was impossible at the time. Additionally, folks like Isaac Newton and Gottfried Leibniz simply invented new approaches to solve general problems.
The folks that assume LLM are "AI"... also are biased to turn a blind eye to clear isomorphic plagiarism in the models. Note too, LLM activation capping only reduces aberrant offshoots from the expected reasoning models behavioral vector (it can never be trusted.) Thus, will spew nonsense when faced with some unknown domain search space.
Most exams do not have ambiguous or unknown contexts in the answer key, and a machine should score 100% matching documented solutions without fail. However, LLM would also require >75% of our galaxy energy output to reach 1 human level intelligence error rates in general.
YC has too many true believers with "AI" hype, and it is really disturbing. =3
In general, "any conceivable LLM" was the metric based on current energy usage trends within the known data-centers peak loads (likely much higher due to municipal NDA.) A straw-man argument on whether it is asymptotic or not is irrelevant with numbers that large. For example, 75% of a our galaxy energy output... now only needing 40% total output... does not correct a core model design problem.
LLM are not "AI", and unlikely ever will be due to that cost... but Neuromorphic computing is a more interesting area of study. =3
They target those ads by ingesting as many signals as possible from as many input devices & sensors as they can possibly convince people to use. They make a lot of money from advertising b/c they have managed to convince the most number of people to give them as many behavioral signals as possible & they will continue to do so. They kill products only when the signal is not valuable enough to improve their advertising business but that's clearly not the case w/ AI.
It's more intellectually lazy to think boolean logic at a sufficient scale crosses some event horizon wherein its execution on mechanical gadgets called computers somehow adds up to intelligence beyond human understanding.
It is intellectually lazy to proclaim something to be impossible in the absence of evidence or proof. In the case of the statement made here, it is provably true that Boolean logic at sufficient scale can replicate "intelligence" of any arbitrary degree. It is also easy to show that this can be perceived as an "event horizon" since the measurements of model quality that humans typically like to use are so nonlinear that they are virtually step function-like.
Doesn't seem like you have proof of anything but it does appear that you have something that is very much like religious faith in an unforeseeable inevitability. Which is fine as far as religion is concerned but it's better to not pretend it's anything other than blind faith.
But if you really do have concrete proof of something then you'll have to spell it out better & explain how exactly it adds up to intelligence of such magnitude & scope that no one can make sense of it.
> "religious faith in an unforeseeable inevitability"
For reference, I work in academia, and my job is to find theoretical limitations of neural nets. If there was so much of a modicum of evidence to support the argument that "intelligence" cannot arise from sufficiently large systems, my colleagues and I would be utterly delighted and would be all over it.
Here are a couple of standard elements without getting into details:
1. Any "intelligent" agent can be modelled as a random map from environmental input to actions.
2. Any random map can be suitably well-approximated by a generative transformer. This is the universal approximation theorem. Universal approximation does not mean that models of a given class can be trained using data to achieve an arbitrary level of accuracy, however...
3. The neural scaling laws (first empirical, now more theoretically established under NTK-type assumptions), as a refinement of the double descent curve, assert that a neural network class can get arbitrarily close to an "entropy level" given sufficient scale. This theoretical level is so much smaller than any performance metric that humans can reach. Whether "sufficiently large" is outside of the range that is physically possible is a much longer discussion, but bets are that human levels are not out of reach (I don't like this, to be clear).
4. The nonlinearity of accuracy metrics comes from the fact that they are constructed from the intersection of a large number of weakly independent events. Think the CDF of a Beta random variable with parameters tending to infinity.
Look, I understand the scepticism, but from where I am, reality isn't leaning that way at the moment. I can't afford to think it isn't possible. I don't think you should either.
As I said previously, you are welcome to believe whatever you find most profitable for your circumstances but I don't find your heuristics convincing. If you do come up or stumble upon a concrete constructive proof that 100 trillion transistors in some suitable configuration will be sufficiently complex to be past the aforementioned event horizon then I'll admit your faith was not misplaced & I will reevaluate my reasons for remaining skeptical of Boolean arithmetic adding up to an incomprehensible kind of intelligence beyond anyone's understanding.
Which part was heuristic? This format doesn't lend itself to providing proofs, it isn't exactly a LaTeX environment. Also why does the proof need to be constructive? That seems like an arbitrarily high bar to me. It suggests that you are not even remotely open to the possibility of evidence either.
I also don't think you understand my point of view, and you mistake me for a grifter. Keeping the possibility open is not profitable for me, and it would be much more beneficial to believe what you do.
I didn't think you were a grifter but you only presented heuristics so if you have formal references then you can share them & people can decide on their own what to believe based on the evidence presented.
Fine, that's fair. I believe the statement that you made is countered by my claim, which is:
Theorem. For any tolerance epsilon > 0, there exists a transformer neural network of sufficient size that follows, up to the factor epsilon, the policy that most optimally achieves arbitrary goals in arbitrary stochastic environments.
Proof (sketch). For any stochastic environment with a given goal, there exists a model that maximizes expected return under this goal (not necessarily unique, but it exists). From Solomonoff's convergence theorem (Theorem 3.19 in [1]), Bayes-optimal predictors under the universal Kolmogorov prior converge with increasing context to this model. Consequently, there exists an agent (called the AIXI agent) that is Pareto-optimal for arbitrary goals (Theorem 5.23 in [1]). This agent is a sequence-to-sequence map with some mild regularity, and satisfies the conditions of Theorem 3 in [2]. From this universal approximation theorem (itself proven in Appendices B and C in [2]), there exists a transformer neural network of a sufficient size that replicates the AIXI agent up to the factor epsilon.
This is effectively the argument made in [3], although I'm not fond of their presentation. Now, practitioners still cry foul because existence doesn't guarantee a procedure to find this particular architecture (this is the constructive bit). This is where the neural scaling law comes in. The trick is to work with a linearization of the network, called the neural tangent kernel; it's existence is guaranteed from Theorem 7.2 of [4]. The NTK predictors are also universal and are a subset of the random feature models treated in [5], which derives the neural scaling laws for these models. Extrapolating these laws out as per [6] for specific tasks shows that the "floor" is always below human error rates, but this is still empirical because it works with the ill-defined definition of superintelligence that is "better than humans in all contexts".
[1] Hutter, M. (2005). Universal artificial intelligence: Sequential decisions based on algorithmic probability. Springer Science & Business Media.
Good question. It's because we don't need to be completely optimal in practice, only epsilon close to it. Optimality is undecidable, but epsilon close is not, and that's what the claim says that NNs can provide.
That doesn't address what I asked. The paper I linked proves undecidability for a much larger class of problems* which includes the case you're talking about of asymptotic optimality. In any case, I am certain you are unfamiliar w/ what I linked b/c I was also unaware of it until recently & was convinced by the standard arguments people use to convince themselves they can solve any & all problems w/ the proper policy optimization algorithm. Moreover, there is also the problem of catastrophic state avoidance even for asymptotically optimal agents: https://arxiv.org/abs/2006.03357v2.
* - Corollary 3.4. For any fixed ε, 0 < ε < 1, the following problem is undecidable: Given is a PFA M for which one of the two cases hold:
(1) the PFA accepts some string with probability greater than 1 − ε, or
(2) the PFA accepts no string with probability greater than ε.
Oh yes, that's one of the more recent papers from Hutter's group!
I don't believe there is a contradiction. AIXI is not computable and optimality is undecidable, this is true. "Asymptotic optimality" refers to behaviour for infinite time horizons. It does not refer to closeness to an optimal agent on a fixed time horizon. Naturally the claim that I made will break down in the infinite regime because the approximation rates do not scale with time well enough to guarantee closeness for all time under any suitable metric. Personally, I'm not interested in infinite time horizons and do not think it is an important criterion for "superintelligence" (we don't live in an infinite time horizon world after all) but that's a matter of philosophy, so feel free to disagree. I was admittedly sloppy with not explicitly stating that time horizons are considered finite, but that just comes from the choice of metric in the universal approximation which I have continued to be vague about. That also covers the Corollary 3.4, which is technically infinite time horizon (if I'm not mistaken) since the length of the string can be arbitrary.
Mining rigs have a finite lifespan & the places that make them in large enough quantities will stop making new ones if a more profitable product line, e.g. AI accelerators, becomes available. I'm sure making mining rigs will remain profitable for a while longer but the memory shortages are making it obvious that most production capacity is now going towards AI data centers & if that trend continues then hashing capacity will continue diminishing b/c the electricity cost & hardware replenishment will outpace mining rewards.
Bitcoin was always a dead end. It might survive for a while longer but its demise is inevitable.
Because they encode statistical properties of the training corpus. You might not know why they work but plenty of people know why they work & understand the mechanics of approximating probability distributions w/ parametrized functions to sell it as a panacea for stupidity & the path to an automated & luxurious communist utopia.
Yes, yes, no one understands how anything works. Calculus is magic, derivatives are pixie dust, gradient descent is some kind of alien technology. It's amazing hairless apes have managed to get this far w/ automated boolean algebra handed to us from our long forgotten godly ancestors, so on & so forth.
No this is false. No one understands. Using big words doesn’t change the fact that you cannot explain for any given input output pair how the LLM arrived at the answer.
Every single academic expert who knows what they are talking about can confirm that we do not understand LLMs. We understand atoms and we know the human brain is made 100 percent out of atoms.we may know how atoms interact and bond and how a neuron works but none of this allows us to understand the brain. In the same way we do not understand LLMs.
Characterizing ML as some statistical approximation or best fit curve is just using an analogy to cover up something we don’t understand. Heck the human brain can practically be characterized by the same analogies. We. Do. Not. Understand. LLMs. Stop pretending that you do.
I'm not pretending. Unlike you I do not have any issues making sense of function approximation w/ gradient descent. I learned this stuff when I was an undergrad so I understand exactly what's going on. You might be confused but that's a personal problem you should work to rectify by learning the basics.
omfg the hard part of ML is proving back-propagation from first principles and that's not even that hard. Basic calculus and application of the chain rule that's it. Anyone can understand ML, not anyone can understand something like quantum physics.
Anyone can understand the "learning algorithm" but the sheer complexity of the output of the "learning algorithm" is way to high such that we cannot at all characterize even how an LLM arrived at the most basic query.
This isn't just me saying this. ANYONE who knows what they are talking about knows we don't understand LLMs. Geoffrey Hinton: https://www.youtube.com/shorts/zKM-msksXq0. Geoffrey, if you are unaware, is the person who started the whole machine learning craze over a decade ago. The god father of ML.
Understand?
There's no confusion. Just people who don't what they are talking about (you)
I don't see how telling me I don't understand anything is going to fix your confusion. If you're confused then take it up w/ the people who keep telling you they don't know how anything works. I have no such problem so I recommend you stop projecting your confusion onto strangers in online forums.
The only thing that needs to be fixed here is your ignorance. Why so hostile? I'm helping you. You don't know what you're talking about and I have rectified that problem by passing the relevant information to you so next time you won't say things like that. You should thank me.
I don't see how you interpreted it that way so I recommend you make fewer assumptions about online content instead of asserting your interpretation as the one & only truth. It's generally better to assume as little as possible & ask for clarifications when uncertain.
reply