I think what you're describing is vertical integration rather than the walled garden specifically. The walled garden is the App Store restrictions, iMessage lock-in, that kind of thing. What made the Neo possible is that Apple controls the silicon, the OS, the firmware, and the industrial design as a single unit. They could put a phone chip in a laptop form factor and have it feel coherent because there's no seam between the hardware and software teams.
The distinction matters because it changes what the lesson is for the rest of the industry. You don't need a walled garden to compete here. You need to own enough of the stack that you can make aggressive tradeoffs (like shipping 8GB and an A18 Pro) without everything falling apart at the integration boundaries. Microsoft can't do that because they don't make the hardware. Dell and Lenovo can't do that because they don't make the OS. Qualcomm can't do that because they don't control the software ecosystem.
The one company that could theoretically pull this off is Google with ChromeOS on their own Tensor chips, and the fact that they haven't is probably the more interesting question than why Asus is shocked.
>The one company that could theoretically pull this off is Google with ChromeOS on their own Tensor chips, and the fact that they haven't is probably the more interesting question than why Asus is shocked.
Successful Chromebook’s have always been the throwaway $200 models. Higher end ones like the Pixelbook served more as flagship devices to prove they could do more but were never really marketed.
I don’t think Google’s gonna make a souped up Chromebook because they know their place. They’re entirely internet dependent devices with little brand recognition and no serious software. The Neo serves somewhere in between that. They have the brand recognition and MacOS.
What software do you want to be considered serious? With the addition of Linux/Crostini, there's 3D modeling, CAD, and NLE video editing and compilers and everything else.
What's the etc? Davinci Resolve is available on Linux and is an industry standard for video editing. Blender's no slouch either these days. I'll give you Ableton though.
This is the most interesting point in the thread to me. Tolerance stack-up is the reason tight per-part tolerances matter at all. A single brick being precise is table stakes for injection molding. The hard problem is what happens when you compose hundreds of them.
The decoupling strategy you're describing is really similar to how you handle error accumulation in any large composed system. You can't make individual components perfect enough to avoid drift at scale, so you introduce boundaries where the accumulated error gets absorbed rather than propagated. In Lego's case that means designing joints between sections that are forgiving enough to accommodate the stack-up from each chunk independently.
It's also why knockoff bricks can feel fine for small builds and then fall apart (sometimes literally) on larger ones. If your per-part tolerance is 3x worse, it doesn't matter much for a 20-piece build, but for a 2000-piece build your cumulative error budget is blown long before you're done. The failure mode isn't that any individual brick is bad, it's that the composition doesn't hold.
I'd be curious whether Lego publishes or talks about those chunk size design rules anywhere. That seems like the actually interesting engineering story, more so than the per-part tolerance numbers that get repeated in every article about them.
The quality cliff question is the right one to be asking. There's a pattern in systems work where something that scales cleanly in theory hits emergent failure modes at production scale that weren't visible in smaller tests. The loss landscape concern is exactly that kind of thing, and nobody has actually run the experiment.
That said, I think the comparison to improving GGUF quantization isn't quite apples to apples. Post-training quantization is compressing a model that already learned its representations in high precision. Native ternary training is making an architectural bet that the model can learn equally expressive representations under a much tighter constraint from the start. Those are different propositions with different scaling characteristics. The BitNet papers suggest the native approach wins at small scale, but that could easily be because the quantization baselines they compared against (Llama 3 at 1.58 bits) were just bad. A full-precision model wasn't designed to survive that level of compression.
The real tell will be whether anyone with serious compute (not Microsoft, apparently) decides the potential inference cost savings justify a full training run. The framework existing lowers one barrier, but the more important barrier is that a failed 100B training run is extremely expensive, and right now there's not enough evidence to derisk it. Two years of framework polish without a flagship model is a notable absence.
There's something real in the impedance mismatch argument that I think the replies here are too quick to dismiss. The browser's programming model is fundamentally about a graph of objects with identity, managed by a GC, mutated through a rich API surface. Linear memory is genuinely a poor match for that, and the history of FFI across mismatched memory models (JNI, ctypes, etc.) tells us this kind of boundary is where bugs and performance problems tend to concentrate. You're right to point at that.
Where I think the argument goes wrong is in treating "most websites don't use WASM" as evidence that WASM is a bad fit for the web. Most websites also don't use WebGL, WebAudio, or SharedArrayBuffer. The web isn't one thing. There's a huge population of sites that are essentially documents with some interactivity, and JS is obviously correct for those. Then there's a smaller but economically significant set of applications (Figma, Google Earth, Photoshop, game engines) where WASM is already the only viable path because JS can't get close on compute performance.
The component model proposal isn't trying to replace JS for the document-web. It's trying to lower the cost of the glue layer for that second category of application, where today you end up maintaining a parallel JS shim that does nothing but shuttle data across the boundary. Whether the component model is the right design for that is a fair question. But "JS is the right abstraction" and "WASM is the wrong abstraction" aren't really in tension, because they're serving different parts of the same platform.
The analogy I'd reach for is GPU compute. Nobody argues that shaders should replace CPU code for most application logic, but that doesn't make the GPU a "dud" or a second-class citizen. It means the platform has two execution models optimized for different workloads, and the interesting engineering problem is making the boundary between them less painful.
> The browser's programming model is fundamentally about a graph of objects with identity, managed by a GC, mutated through a rich API surface.
Even more to the point, for the past couple of decades the browser's programming model has just been "write JavaScript". Of course it's going to fit JavaScript better than something else right now! That's an emergent property though, not something inherent about the web in the abstract.
There's an argument to be made that we shouldn't bother trying to change this, but it's not the same as arguing that the web can't possibly evolve to support other things as well. In other words, the current model for web programming we have is a local optimum, but statements like the the one at the root of this comment chain talk like it's a global one, and I don't think that's self-evident. Without addressing whether they're opposed to the concept or the amount of work it would take, it's hard to have a meaningful discussion.
The distinction between effective and administrative fraud is useful and I think underappreciated. A lot of the conversation in these threads conflates the two, which makes it hard to reason about what actually needs fixing.
I want to push back a little on "science is self-correcting" though. It's true in the limit, but correction has a latency, and that latency has real costs. In fields like nutrition, psychology, or pharmacology, a fraudulent or deeply flawed result can shape clinical guidelines, public policy, and drug development pipelines for a decade or more before the correction lands. The people harmed during that window don't get made whole by the eventual retraction.
The comparison I keep coming back to is fault tolerance in distributed systems. You can build a system that's "eventually consistent" and still have it be practically broken if convergence takes too long or if bad state propagates faster than corrections do. The fraud networks described in TFA are basically an adversarial workload against a system (peer review) that was designed for a much lower rate of bad input. Saying the system self-corrects is accurate, but it's not the same as saying the system is healthy or that the current correction rate is adequate.
I think the practical question isn't whether science corrects itself in theory but whether the feedback loops are fast enough relative to the rate of fraud production, and right now the answer seems pretty clearly no.
Re self-correcting science. In the area I am most familiar with (basic life sciences), correction happens pretty quickly. But I don’t worry about public policy much.
But I’m comfortable arguing that where science intersects with policy, fraud plays a very minor role. I suspect that most policy “mistakes” (policies that were adopted and then reversed) are more about the need for a policy in the absence of data (covid and masks), or subtle tradeoffs (covid and masks), or a policy choice that seems slightly better than an alternative (mammography) but also has poorly understood harms. Policy involves politics, and science unfortunately plays less of a role than one might like (and fraudulent science an even smaller role). This is not my field, but I cannot think of policies that were reversed because of discoveries of fraud (perhaps thalidomide and other drug approvals).
This is a real pain point and I run into the same tension in systems where data crosses serialization boundaries constantly. The prototype-stripping problem you're describing with JSON.parse/stringify is a specific case of a more general issue: rich domain objects don't survive wire transfer without a reconstitution step.
That said, I think the Temporal team made the right call here. Date-time logic is one of those domains where the "bag of data plus free functions" approach leads to subtle bugs because callers forget to pass the right context (calendar system, timezone) to the right function. Binding the operations to the object means the type system can enforce that a PlainDate never accidentally gets treated as a ZonedDateTime. date-fns is great but it can't give you that.
The serialization issue is solvable at the boundary. If you're using tRPC or similar, a thin transform layer that calls Temporal.Whatever.from() on the way in and .toString() on the way out is pretty minimal overhead. Same pattern people use with Decimal types or any value object that doesn't roundtrip through JSON natively. Annoying, sure, but the alternative is giving up the type safety that makes the API worth having in the first place.
For most situations, I deal with this by keeping dates as strings throughout the app, not objects. They get read from the db as strings, passed around as strings. If I need datetime calculations, I use the language's datetime objects to do it and convert right back to string. Display formatting for users happens at the last moment, in the template.
No-one seems to like this style, but I find it much simpler than converting on db read/write and passing datetime objects around.
Sounds like we need an extended JSON with the express intent of conveying common extended values and rich objects: DateTime instants (with calendar system & timezone), Decimal, BigInt, etc.
I disagree: this is not unlike including the schema in the JSON itself. This should be handled by the apps themselves, since they would have to know what the keys mean regardless.
If you do want the interchange format to be the one deserializing into specific runtime data structures, use YAML. YAML's tag syntax allows you to run arbitrary code inside YAML, which can be used for what you want.
I'm not talking about something arbitrarily extensible or compound values like vectors or lat/lon. Just a few more common data types -- primitive-like values that frequently need to be passed around.
This would probably best exist as a well-known wrapper around JSON itself.
there are a zillion of these "json pro" kind of things: superjson, devalue, capnweb, all with slightly different ideas about how to lower high-level semantics to json's available types. it's so easy to do this kind of thing, its a real https://xkcd.com/927/ situation.
CBOR (Concise Binary Object Representation) has JSON-like semantics with type extension support; with built in type extensions its much easier to get some agreement about registering certain magic type IDs to mean certain things. for example from a random google search for "cbor datetime" https://j-richter.github.io/CBOR/date.html; there's an IANA registry of type IDs: https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml
I think a more practical and compatible approach is to keep json as it is, and use a side channel (e.g. an openapi spec) to convey metadata.
Then it is up to the client to decide that a date returned as a string is a date or string, or to create a specific class instead of a generic object
It's not that much about type safety. Since TypeScript uses duck typing, a DateTime could not be used as a ZonedDateTime because it'd lack the "timezone" property. The other way around, though, it would work. But I wouldn't even mind that, honestly.
The real drawback of the functional approach is UX, because it's harder to code and you don't get nice auto-complete.
It was just one of my numerous thought experiments....starting with text only social network (like Moltbook) is probably way easier than to go with full blown Facebook/Instagram type network.
Humans are notoriously bad at formal logic. The Wason selection task is the classic example: most people fail a simple conditional reasoning problem unless it’s dressed up in familiar social context, like catching cheaters. That looks a lot more like pattern matching than rule application.
Kahneman’s whole framework points the same direction. Most of what people call “reasoning” is fast, associative, pattern-based. The slow, deliberate, step-by-step stuff is effortful and error-prone, and people avoid it when they can. And even when they do engage it, they’re often confabulating a logical-sounding justification for a conclusion they already reached by other means.
So maybe the honest answer is: the gap between what LLMs do and what most humans do most of the time might be smaller than people assume. The story that humans have access to some pure deductive engine and LLMs are just faking it with statistics might be flattering to humans more than it’s accurate.
Where I’d still flag a possible difference is something like adaptability. A person can learn a totally new formal system and start applying its rules, even if clumsily. Whether LLMs can genuinely do that outside their training distribution or just interpolate convincingly is still an open question. But then again, how often do humans actually reason outside their own “training distribution”? Most human insight happens within well-practiced domains.
> The Wason selection task is the classic example: most people fail a simple conditional reasoning problem unless it’s dressed up in familiar social context, like catching cheaters.
I've never heard about the Wason selection task, looked it up, and could tell the right answer right away. But I can also tell you why: because I have some familiarity with formal logic and can, in your words, pattern-match the gotcha that "if x then y" is distinct from "if not x then not y".
In contrast to you, this doesn't make me believe that people are bad at logic or don't really think. It tells me that people are unfamiliar with "gotcha" formalities introduced by logicians that don't match the everyday use of language. If you added a simple additional to the problem, such as "Note that in this context, 'if' only means that...", most people would almost certainly answer it correctly.
Mind you, I'm not arguing that human thinking is necessarily more profound from what what LLMs could ever do. However, judging from the output, LLMs have a tenuous grasp on reality, so I don't think that reductionist arguments along the lines of "humans are just as dumb" are fair. There's a difference that we don't really know how to overcome.
Quoting the Wikipedia article's formulation of the task for clarity:
> You are shown a set of four cards placed on a table, each of which has a number on one side and a color on the other. The visible faces of the cards show 3, 8, blue and red. Which card(s) must you turn over in order to test that if a card shows an even number on one face, then its opposite face is blue?
Confusion over the meaning of 'if' can only explain why people select the Blue card; it can't explain why people fail to select the Red card. If 'if' meant 'if and only if', then it would still be necessary to check that the Red card didn't have an even number. But according to Wason[0], "only a minority" of participants select (the study's equivalent of) the Red card.
People in everyday life are not evaluating rules. They evaluate cases, for whether a case fits a rule.
So, when being told:
"Which card(s) must you turn over in order to test that if a card shows an even number on one face, then its opposite face is blue?"
they translate it to:
"Check the cards that show an even number on one face to see whether their opposite face is blue and vice versa"
Based on this, many would naturally pick the blue card (to test the direct case), and the 8 card (to test the "vice versa" case).
They wont check the red to see if there's an odd number there that invalidates the formulation as a general rule, because they're not in the mindset of testing a general rule.
Would they do the same if they had more familiarity with rule validation in everyday life or if the had a more verbose and explicit explanation of the goal?
Yeah maybe if you phrased it as "Which card(s) must you turn over in order to ensure that all odd-numbered cards are blue?" you'd get a better response?
Exactly. We invented rule-based machines so that we could have a thing that follows rules, and adheres strictly to them, all day long.
Im not sure why people keep comparing machine-behaviour to human's. Its like Economic models that assume perfect rationality... yeah that's not reality mate.
I've confidently picked 8+blue and is now trying to understand why I personally did that. I think that maybe the text of the puzzle is not quite unambiguous. The question states "test a card" followed by "which cards", so this is what my brain immediately starts to check - every card one by one. Do I need to test "3"? No, not even. Do I need to test "8"? yes. Do I need to test "blue"? Yes, because I need to test "a card" to fit the criteria. And lastly "red" card also immediately fails verification of a "a card" fitting that criteria.
I think a corrected question should clarify in any obvious way that we are verifying not "a card" but "a rule" applicable to all cards. So a needs to be replaced with all or any, and mention of rule or pattern needs to be added.
It also doesn't explain why people don't think it necessary to check the 3 to make sure it's not blue (which it would be if "if" meant "if and only if").
Though note that as GP said, on the Wason selection task, people famously do much better when it's framed in a social context. That at least partially undermines your theory that its lack of familiarity with the terminology of formal logic.
Maybe the social version just creates a context where "if x then y" obviously does not include "if not x then not y". Everyone knows people over the drinking age can drink both alcoholic and non-alcoholic drinks, so you obviously don't have to check the person drinking the soft drink to make sure they aren't an adult.
I think we're actually closer to agreement than it might seem.
You're right that the Wason task is partly about a mismatch between how "if" works in formal logic and how it works in everyday language. That's a fair point. But I think it actually supports what I'm saying rather than undermining it. If people default to interpreting "if x then y" as "if and only if" based on how language normally works in conversation, that is pattern-matching from familiar context. It's a totally understandable thing to do, and I'm not calling it a cognitive defect. I'm saying it's evidence that our default mode is contextual pattern-matching, not rule application. We agree on the mechanism, we're just drawing different conclusions from it.
Your own experience is interesting too. You got the right answer because you have some background in formal logic. That's exactly what I'd expect. Someone who's practiced in a domain recognizes the pattern quickly. But that's the claim: most reasoning happens within well-practiced domains. Your success on the task doesn't counter the pattern-matching thesis, it's a clean example of it working well.
On the broader point about LLMs having a "tenuous grasp on reality," I hear that, and I don't want to flatten the differences. There probably is something meaningfully different going on with how humans stay grounded. I just think the "humans reason, LLMs pattern-match" framing undersells how much human cognition is also pattern-matching, and that being honest about that is more productive than treating it as a reductionist insult.
As they say, "think about how smart the average person is, then realize half the population is below that". There are far more haikus than opuses walking this planet.
We keep benchmarking models against the best humans and the best human institutions - then when someone points out that swarms, branching, or scale could close the gap, we dismiss it as "cheating". But that framing smuggles in an assumption that intelligence only counts if it works the way ours does. Nobody calls a calculator a cheat for not understanding multiplication - it just multiplies better than you, and that's what matters.
LLMs are a different shape of intelligence. Superhuman on some axes, subpar on others. The interesting question isn't "can they replicate every aspect of human cognition" - it's whether the axes they're strong on are sufficient to produce better than human outcomes in domains that matter. Calculators settled that question for arithmetic. LLMs are settling it for an increasingly wide range of cognitive work. The fact that neither can flip a burger is irrelevant.
Humans don't have a monopoly on intelligence. We just had a monopoly on generality and that moat is shrinking fast.
The "God of the gaps" theory is a theological and philosophical viewpoint where gaps in scientific knowledge are cited as evidence for the existence and direct intervention of a divine creator. It asserts that phenomena currently unexplained by science—such as the origin of life or consciousness—are caused by God.
We are doing inversion of God of gaps to "LLM of Gaps" where gaps in LLM capabilities are considered inherently negative and limiting
It is not actually the gaps in capability, and instead it arises from an understanding of how it works and an honest acknowledgement of how far it could go.
The question is not if these things are actually intelligent or not. The question is if these things will be useful without an endless supply of training data and continuous re-alignment using it..
And the questions "Are these things really intelligent" is just a proxy for that.
And we are interested in that question because that is necessary to justify the massive investment these things are getting now. It is quite easy to look at these things and conclude that it will continue to progress without any limit.
But that would be like looking at data compression at the time of its conception, and thinking that it is only a matter of time we can compress 100GB into 1KB..
We live in a time of scams that are obvious if you take a second look. If something that require much deeper scrutiny, then it is possible to generate a lot more larger bubble.
> and that moat is shrinking fast..
The point is that in reality it is not. It is just appearance. If you consider how these things work, then there is no justification of this conclusion.
I have said this elsewhere, but the problem of Hallucination itself along with the requirement of re-training, the smoking gun that these things are not intelligence in ways that would justify these massive investments.
> If you added a simple additional to the problem, such as "Note that in this context, 'if' only means that...", most people would almost certainly answer it correctly.
Agreed. More broadly, classical logic isn't the only logic out there. Many logics will differ on the meaning of implication if x then y. There's multiple ways for x to imply y, and those additional meanings do show up in natural language all the time, and we actually do have logical systems to describe them, they are just lesser known.
Mapping natural language into logic often requires a context that lies outside the words that were written or spoken. We need to represent into formulas what people actually meant, rather than just what they wrote. Indeed the same sentence can be sometimes ambiguous, and a logical formula never is.
As an aside, I wanna say that material implication (that is, the "if x then y" of classical logic) deeply sucks, or rather, an implication in natural language very rarely maps cleanly into material implication. Having an implication if x then y being vacuously true when x is false is something usually associated with people that smirk on clever wordplays, rather than something people actually mean when they say "if x then y"
Your response contains a performative contradiction: you are asserting that humans are naturally logical while simultaneously committing several logical errors to defend that claim.
commenter’s specific claim—that adding a note about the definition of "if" would solve the problem—is a moving the goalposts fallacy and a tautology. The comment also suffers from hasty generalization (in their experience the test isn't hard) and special pleading (double standard for LLM and humans).
When someone tells you "you can have this if you pay me", they don't mean "you can also have it if you don't pay". They are implicitly but clearly indicating you gotta pay.
It's as simple as that. In common use, "if x then y" frequently implies "if not x then not y". Pretending that it's some sort of a cognitive defect to interpret it this way is silly.
> Decoding analyses of neural activity further reveal significant above chance decoding accuracy for negated adjectives within 600 ms from adjective onset, suggesting that negation does not invert the representation of adjectives (i.e., “not bad” represented as “good”)[...]
From: Negation mitigates rather than inverts the neural representations of adjectives
> But then again, how often do humans actually reason outside their own “training distribution”? Most human insight happens within well-practiced domains.
Humans can produce new concepts and then symbolize them for communication purposes. The meaning of concepts is grounded in operational definitions - in a manner that anyone can understand because they are operational, and can be reproduced in theory by anyone.
For example, euclid invented the concepts of a point, angle and line to operationally represent geometry in the real world. These concepts were never "there" to begin with. They were created from scratch to "build" a world-model that helps humans navigate the real world.
Euclid went outside his "training distribution" to invent point, angle, and line. Humans have this ability to construct new concepts by interaction with the real world - bringing the "unknown" into the "known" so-to-speak. Animals have this too via evolution, but it is unclear if animals can symbolize their concepts and skills to the extent that humans can.
> Humans can produce new concepts and then symbolize them for communication purposes.
Sure, but the question is how often this actually happens versus how often people are doing something closer to recombination and pattern-matching within familiar territory. The point was about the base rate of genuine novel reasoning in everyday human cognition, and I don't think this addresses that.
> Euclid invented the concepts of a point, angle and line to operationally represent geometry in the real world. These concepts were never "there" to begin with.
This isn't really true though. Egyptian and Babylonian surveyors were working with geometric concepts long before Euclid. What Euclid did was axiomatize and systematize knowledge that was already in wide practical use. That's a real achievement, but it's closer to "sophisticated refinement within a well-practiced domain" than to reasoning from scratch outside a training distribution. If anything the example supports the parent comment.
There's also something off about saying points and lines were "never there." Humans have spatial perception. Geometric intuitions come from embodied experience of edges, boundaries, trajectories. Formalizing those intuitions is real work, but it's not the same as generating something with no prior basis.
The deeper issue is you're pointing to one of the most extraordinary intellectual achievements in human history and treating it as representative of human cognition generally. The whole point, drawing on Kahneman, is that most of what we call reasoning is fast associative pattern-matching, and that the slow deliberate stuff is rarer and more error-prone than people assume. The fact that Euclid existed doesn't tell us much about what the other billions of humans are doing cognitively on a Tuesday afternoon.
> Formalizing those intuitions is real work, but it's not the same as generating something with no prior basis.
> The fact that Euclid existed doesn't tell us much about what the other billions of humans are doing cognitively on a Tuesday afternoon.
Birds can fly - so, there is some flying intelligence built into their dna. But, are they aware of their skill to be able to create a theory of flight, and then use that to build a plane ? I am just pointing out that intuitions are not enough - the awareness of the intuitions in a manner that can symbolize and operationalize it is important.
> The whole point, drawing on Kahneman, is that most of what we call reasoning is fast associative pattern-matching, and that the slow deliberate stuff is rarer and more error-prone than people assume
David Bessis, in his wonderful book [1] argues that the cognitive actions done by you and I on a tuesday afternoon is the same that mathematicians do - just that we are unaware of it. Also, since you brought up Kahneman, Bessis proposes a System 3 wherein inaccurate intuitions is corrected by precise communication.
[1] Mathematica: A Secret World of Intuition and Curiosity
The bird analogy is actually a really good one, but I think it supports a narrower claim than you're making. You're right that the capacity to symbolize and formalize intuitions is a distinct and important thing, separate from just having the intuitions. No argument there. But my point wasn't that symbolization doesn't matter. It was about how often humans actually exercise that capacity in a strong sense versus doing something more like recombination within familiar frameworks. The bird can't theorize flight, agreed. But most humans who can in principle theorize about their intuitions also don't, most of the time. The capacity exists. The base rate of its deployment is the question.
On Bessis, I actually think his argument is more compatible with what I was saying than it might seem. If the cognitive process underlying mathematical reasoning is the same one operating on a Tuesday afternoon, that's an argument against treating Euclid-level formalization as categorically different from everyday cognition. It suggests a continuum rather than a bright line between "pattern matching" and "genuine reasoning." Which is interesting and probably right. But it also means you can't point to Euclid as evidence that humans routinely do something qualitatively beyond what LLMs do. If Bessis is right, then the extraordinary cases and the mundane cases share the same underlying machinery, and the question becomes quantitative (how far along the continuum, how often, under what conditions) rather than categorical.
I'll check out the book though, it sounds like it's making a more careful version of the point than usually gets made in these threads.
> Kahneman’s whole framework points the same direction. Most of what people call “reasoning” is fast, associative, pattern-based. The slow, deliberate, step-by-step stuff is effortful and error-prone, and people avoid it when they can. And even when they do engage it, they’re often confabulating a logical-sounding justification for a conclusion they already reached by other means.
System 1 really looks like a LLM (indeed completing a phrase is an example of what it can do, like, "you either die a hero, or you live enough to become the _"). It's largely unconscious and runs all the time, pattern matching on random stuff
System 2 is something else and looks like a supervisor system, a higher level stuff that can be consciously directed through your own will
But the two systems run at the same time and reinforce each other
In my naive understanding, neither requires any will or consciousness.
S1 is “bare” language production, picking words or concepts to say or think by a fancy pattern prediction. There’s no reasoning at this level, just blabbering. However, language by itself weeds out too obvious nonsense purely statistically (some concepts are rarely in the same room), but we may call that “mindlessly” - that’s why even early LLMs produced semi-meaningful texts.
S2 is a set of patterns inside the language (“logic”), that biases S1 to produce reasoning-like phrases. Doesn’t require any consciousness or will, just concepts pushing S1 towards a special structure, simply backing one keeps them “in mind” and throws in the mix.
I suspect S2 has a spectrum of rigorousness, because one can just throw in some rules (like “if X then Y, not Y therefore not X”) or may do fancier stuff (imposing a larger structure to it all, like formulating and testing a null hypothesis). Either way it all falls down onto S1 for a ultimate decision-making, a sense of what sounds right (allowing us our favorite logical flaws), thus the fancier the rules (patterns of “thought”) the more likely reasoning will be sounder.
S2 doesn’t just rely but is a part of S1-as-language, though, because it’s a phenomena born out (and inside) the language.
Whether it’s willfully “consciously” engaged or if it works just because S1 predicts logical thinking concept as appropriate for certain lines of thinking and starts to involve probably doesn’t even matter - it mainly depends on whatever definition of “will” we would like to pick (there are many).
LLMs and humans can hypothetically do both just fine, but when it comes to checking, humans currently excel because (I suspect) they have a “wider” language in S1, that doesn’t only include word-concepts but also sensory concepts (like visuospatial thinking). Thus, as I get it, the world models idea.
> The story that humans have access to some pure deductive engine and LLMs are just faking it with statistics might be flattering to humans more than it’s accurate.
Your point rings true with most human reasoning most of the time. Still, at least some humans do have the capability to run that deductive engine, and it seems to be a key part (though not the only part) of scientific and mathematical reasoning. Even informal experimentation and iteration rest on deductive feedback loops.
The fact that humans can learn to do X, sometimes well, often badly, and while many don’t, strongly supports the conjecture that X is not how they naturally do things.
I can perform symbolic calculations too. But most people have limited versions of this skill, and many people who don’t learn to think symbolically have full lives.
I think it is fair to say humans don’t naturally think in formal or symbolic reasoning terms.
People pattern match,
Another clue is humans have to practice things, become familiar with them to reason even somewhat reliable about them. Even if they already learned some formal reasoning.
—-
Higher level reasoning is always implemented as specific forms of lower order reasoning.
There is confusion about substrate processing vs. what higher order processes can be created with that substrate.
We can “just” be doing pattern matching from an implementation view, and yet go far “beyond” pattern matching with specific compositions of pattern matching, from a capability view.
How else could neurons think? We are “only” neurons. Yet we far surpass the kinds of capabilities neurons have.
I don't disagree with any of that. My comment was only in relation to the question of human-specific capability that current LLMs may not be able to duplicate. I was not making the value judgments you seem to have read.
When people do math or rigorous deductive reasoning, are we sure they aren't just pattern matching with a set of carefully chosen interacting patterns that have been refined by ancient philosophers as being useful patterns that produce consistent results when applied in correctly patterned ways?
I've often wondered this. I suspect not, though I don't know. You're right that the answer matters to understanding LLM limitations relative to humans, though.
I remember reading about this in a book, 'The enigma of reason', basically it was saying that reasoning was exactly that, we decided and then we came up with a reason for what we had decided and usually not the other way around.
This is because, the 'reasoning' part of our brain came from evolution when we started to communicate with others, we needed to explain our behaviour.
Which is fascinating if you think of the implications of that. In the most part we think we are being logical, but in reality we are pattern matching/impulsive and using our reasoning/logic to come up for excuses for why we have chosen what we had already decided.
It explains a lot about the world and why it's so hard to reason with someone, we are assuming the decision came from reason in the first place, which when you look at such peoples choices, makes sense as it's clear it didn't.
Brilliant insight. The success of LLM reasoning, ie “telling yourself a story”, has greatly increased my belief that humans are actually much less impressive than they seem. I do think it’s mostly pattern matching and a bunch of interacting streams analogous to LLM tokens. Obviously the implementations are different, because nature has to be robust and learn online, but I do not think we are as different from these machines as most people assume. There’s a reason Hofstadter et al. reacted as they did even to the earlier models.
This is why I also think humans being logical inference machines is mostly not true. We are seemingly capable of it, but there must be some cost that keeps it from being commonly used.
While humans did seemingly evolve socially very fast, with the tools we seem to have had for a few hundred thousand years it could have been far faster if there were not some other limitations that are being applied.
Agreed. This also explains why maths is so difficult for humans. It doesn't come "naturally" to use, we have to force ourselves to use it and it "makes our head hurt".
reply