Hacker Newsnew | past | comments | ask | show | jobs | submit | cauch's commentslogin

I don't understand the comments of the kind of "same is true with human".

This feels a bit like whataboutism.

It also feels like people don't listen to each others.

For example, reading the previous comment, it feels like the thing that reduce the enthusiasm was that at first GenAI looks like it was "reading, understanding and using its own knowledge to answer the problem", but as soon as it is a ore niche or a more complex situation, GenAI looks like it "does not understand the code, just does the equivalent of a StackOverflow search and try to apply the solutions that it found there, and this is why it felt like it understood the code before".

It does not at all means that GenAI is not terribly useful. And even better than humans in some situations.

But it feels that answering "same with humans" is missing this point: that's the opposite, humans usually try to understand the code and are bad at covering a very large range of very well documented subjects. That's the "uncanny valley" they talk about: they assumed GenAI performance on a subject X is due to a "human-like" approach, and it feels very strange when this impression falls apart.


No I mean I’m in the camp that believes AI and the human brain are analogous and work the same way. Someone once replied, “then why do I need to supervise them?” and I pointed out that there a people whose job is literally ”supervisor”.

I don't think that it is what means the parent comment you answer.

The comment you answer to says that their experience is that AI and the human brain are not analogous and that AI is good to store large amount of knowledge and repeat it (or extrapolate based on pattern on the large amount of knowledge), but bad at understanding the code as a human does. Which explains why a human is more efficient when reacting on a thing that don't have a lot of documentation (on which the AI built its knowledge).

Humans are bad at storing large amount of knowledge, and this is why we need supervisor for human.

AI are bad to understand new stuff, they need to be able to connect the new stuff with a lot of examples they have been trained on (it does not mean the stuff is "identical", but it means "connected"), and this is why we need supervisor for AI.

We need supervisors for both human and AI, but for different uncorrelated reason.


I don;t get this kind of answers.

- A motor is something that create a force to push a vehicle.

- Oh yeah? My neighbour car does not have wheels and sit on concrete blocks, the vehicle does not move and yet we all agree it has a motor. So it means that I can claim that this other thing that does not move has a motor too.

Sure, human can _some times_ not do some stuffs, but the fact that they can do these stuffs sometimes is the point.

Doing these stuffs is the hard thing. Doing these stuffs is the proof that the machine has what it takes. It does not matter if someone cannot do that stuff, it does not imply that their internal system is not complex enough to potentially do it. But the fact that some people can do that stuff is the demonstration that inside a human skull, there is a system that is complex enough to potentially do it. Unless you can prove that people who don't do it have a fundamentally different system inside their skull, then you cannot pretend that they should be considered as having a less complex system.


exactly! so the arguments against the AI not prompting itself is not a refutation just as it would not be for a person.

Uh?

Human _can_ check themselves. They don't _always_ check themselves.

Motor _can_ move vehicle. They don't _always_ move vehicle.

LLM _cannot_ check themselves. They _never_ can. It is not that some don't, they just cannot, they are not a system complex enough to do so.

So, yes, it is a refutation. If you have something that _never_ can move a vehicle, this thing does not qualify as a motor, even if some motor, sometimes, don't move a vehicle.

And if your next argument is "yeah but I would argue you don't need to check yourself to be conscious or to understand things", then you just redefine the definition that is owned by your interlocutor. Your interlocutor is saying that this is a criteria they are expecting. Good for you if you are not expecting this criteria. But the problem is that the answer is not "this criteria is not expected", the answer is "I change the criteria from 'being capable to in some circumstances' into 'does always do it in any circumstances'".


> LLM _cannot_ check themselves. They _never_ can. It is not that some don't, they just cannot, they are not a system complex enough to do so.

All modern agentic harnesses can do this. Nobody uses raw LLM for anything remotely complex. There's always some external system in place. That system is part of the "thought process".

Adjacency doesn't matter here, only what the result of the system of pieces is.


Check themselves does not mean: do a loop.

It means having self-control on their action and being aware of them. If you ask a system, it will respond, it cannot choose to not respond (even if the response if "I don't want to response", it still "run", still do the work). If you don't ask a system, it will not respond.

Adjacency is the point of the thread here. Saying "you say X is important to decide if the thing is intelligent/understanding/conscious, so let me just change X in the middle of the discussion and say that X does not matter".

That is exactly my first comment in this thread: I don't care if AI think or whatever, my reaction was about these "counter-arguments" that totally miss the point and make the person who push them ridiculous. If you want to have a counter-argument, you first need to understand the interlocutor, not just spew whatever rebuttal you constructed that answer something unrelated to what the interlocutor brought to the conversation.


> Check themselves does not mean: do a loop.

I'm not sure I understand. How do you do it?

In my thought process, I quite literally stop myself, and say "ok, think about what you just said" to check myself. I literally initiate that loop. If I don't, then I'm not using my own mental agency, and just using my firm coded priors.

I will say that I do seem to have a stop, what you said is wrong logic check voice that pops up without me initiating it. But, it's unreliable, and not too much different than all the content monitoring system used for the streaming clients, that will terminate with "content violation" immediately after the "incorrect" words are sent. I don't think integration is important, just the behavior of the overall system.


There is no "loop" in the brain, it is all part of a same line of thought. This is visible because, while you can sometimes have a "two voices / dialectic" way of thinking, you can have the exact same thinking in one-go that does not look like a loop at all.

In fact, in the large majority of the time, you don't process "as a loop" at all, you just continuously progress in your reflection without needing a "second voice" to retrigger you. The fact that sometimes we do this is just something we can do, not the result of something needed for our brain to work. For AI systems, this is something needed because the "answering" part is not able to do the loop on its own. And building a bigger system that combine an "answering" part and a "loop" part does not fit this, does not create a self-reflective system, it just makes a non-self-reflective system and a workaround bundled together.

It's a bit like if the "answering" part was unable to provide only one answer and was always producing plenty of different possible answers, including contradictory ones. Then, you can add an external part that will just pick one answer (and add it to the context so the next large set of answer will not be inconsistent), without any intelligence to it. The whole system will look like a human. But we know that the system is not "living" and "aware", because a "living" or "aware" system has its own opinion, while this system is just generating convincing sentence without seeing any hierarchy or value or meaning in each one.


I would claim that, if you think without introspection (that loop), then there is virtually no self check. I'm not sure what "self check" you see that the brain has. Could you describe this "self check in a line of thought"? How do you perceive the check there? This is a genuine question. It definitely doesn't align with how I think about things. I ponder and talk to myself to iterate verify and test my understanding of my own thoughts.

Maybe a good analogy is "throwing a paper plane in real life" and "throwing a paper plane in a video game".

In real life, the paper moves "by itself". It does not need an external loop that update its position in a loop manner.

In the video game, you need an internal loop, a step-by-step tick, that update the plane position based on its current position and its momentum. And this is why a video game paper plane is not a real object. It is a very good simulation, it looks like it, but it is missing some intrinsic properties that we expect from a real object.

Yet you can analyse the paper plane trajectory and see it as a Markov chain, with quantified step-by-step progress (for example one position point every 0.1 second). The same way you can look at your though process and identify a step-by-step progression. But it does not mean that it works like that intrinsically, it does not mean that the paper plane "jumps" from position point at time T1 to position point at time T1+0.1 second.

For the human brain, there is no "loop centre" in the brain. There is no one (to my knowledge) who got a brain injury and suddenly were unable to keep a single line of thought without having someone else having to feed them the previous thought in order to feed the next thought.

In the brain, the fact that the previous thought feeds the next thought is "how it works", it is intrinsic, it is by design. And this mechanism of thoughts feeding the next thoughts is what creates "consciousness" or "awareness": self-reflection is based on the fact that thoughts are intrinsically linked together, that they "flow" continuously, without needing an external system to update them.

You cannot take away the "loop" part of the paper plane so that it suddenly would be unable to move on its own once thrown away.

Now, you can always say "well, the paper plane in the video game is a very good simulation, it does not matter if it is a real object or not", and that is fair enough. But in this discussion, some people have arguments to support that this property matters, that it is one condition for consciousness or awareness.


> For the human brain, there is no "loop centre" in the brain.

There are definitely cognitive feedback loops: https://pmc.ncbi.nlm.nih.gov/articles/PMC11903256/

Is your argument that, because they're external to the Llm, rather than integrated, they don't count, not even in a practical sense?

I think the result of the system is all that's important. Where/how it's implemented doesn't matter for practical results.

If the argument here is that LLM don't have this built in, you should know that nobody has a practical use for plain LLMs these days. Nobody uses them this way, except for debug. All interesting use is through some kind of harness, with all sorts of systems bolted on. I think these conversations are only meaningful in this "agent" context that people actually use LLM, where they stop when they think they're done.

LLM don't have a some self contained loop, like we do, sure. Who cares though. The actual AI system that we use every day definitely do.


> There are definitely cognitive feedback loops

Have you read the article in question. It is saying that for one continuous thought, the brain will use different part of the brain to do different thing. It does not say that there is a "loop controler" anywhere. On the contrary, it illustrates that there is no loop controller: there is not special brain function that control this loop, this loop is "how the brain works", and LLM don't do that, they are incapable to do that, it is not how they work.

> Is your argument that, because they're external to the Llm, rather than integrated, they don't count, not even in a practical sense?

No, my argument is that the nature of the brain and the nature of the LLM are very different, as different as a real paper plane and a video game paper plane. Some characteristics (for example, awareness) that exist in the brain cannot exist in the LLM because these characteristics are the result of the nature of the thing in question.

The problem is not that you build a system by integrating 2 things together. The problem is that they are different "things", they are different machines, they function, fundamentally, differently. They may produce the same output, but when you say "the brain has the characteristic X, the LLM produce the same output, so the LLM also has the characteristic X", it is logically inconsistent.

Planes are built as a system combining 2 things: a motor and some wings. But they are fundamentally different from a bird. They just don't "work" the same. It is not the same mechanism.

> you should know that nobody has a practical use for plain LLMs these days

That is totally irrelevant. My point is about the nature of the LLM, and the fact that it is stupid to see the same output and to conclude that they have the same characteristic. It is like saying "Birds are flying in the air and are alive. Planes are flying in the air, so I guess they are alive".

> LLM don't have a some self contained loop, like we do, sure. Who cares though. The actual AI system that we use every day definitely do.

No, you miss the point. The problem is not that "you can just add an external loop". The problem is that the brain is a system that works without such control loop. The thoughts are flowing (and they may flow to different brain functions, like explained in the article you quote). It is part of how the system works. Having a system that contains 2 things, one that does one computation and one that control the loop is not equivalent to another system where you cannot decouple the "flowing of the thought" from the "thinking machine".


The OP said: nozzlegear 19 hours ago [–]

Yes, LLMs don't think on their own, for one; they think when you invoke them.

My rebuttal is that people only think when invoked just the same and can enter states where there is no consciousness just the same. The OP has already accepted that LLMs think, but it seems that you are arguing they do not? This car business is confusing and the LLMs not checking themselves is also wrong, there’s even a benchmark for this https://correctbench.github.io/


No.

OP says: LLMs never think when not invoked.

What you said: I have example where, sometimes, human think when invoked.

That's the difference: human brains are intrinsically different because they are built to be able to think without being invoked, even if there are situations where they think when invoked.

There are tons of obvious examples of human thinking without being invoked. Just take a bath and you will see :)


To be clear: the person I was replying to asked if the way a human thinks was any different from an LLM with a context window. That's the context of my answer. An LLM is a machine, it can't do anything unless we invoke it or give it the instructions and capabilities to do so. It has no free will, it can't just decide to compose a symphony one day unless those are part of its instructions. It can't do anything unless we tell it to do so and give it the capabilities to do so, it doesn't even exist unless it's loaded into memory. That's obviously different from human consciousness, and that's the whole of the point that I'm making.

You can argue that humans are just biological machines reacting to external stimuli, but that's a philosophical argument that I'm not interested in having and frankly, I think it'd be selling yourself short a little bit.


Thank you I appreciate your opinion. I do think that we are reacting to external stimuli even though our ego is uncomfortable with being deterministic in any way, which is the free will point that you hit on. I think that is likely to be the point that keeps the argument going as it’s not a settled debate absent any AI, which we clearly all see as either deterministic or semi-random when the temperature gets turned up.

As far as the argument about being loaded in memory, if there’s any consciousness in AI, it’s obviously in different form than a biological consciousness. We’d have to agree that consciousness does not require a body to get past this.


While I agree that a AI system is not just the LLM, for me, the problem is that LLM alone (the one from years ago, which were basically stateless LLM) are already too convincingly looking like real human conversation at first sight.

It shows that the LLM part found ways to mimic human conversation with a mechanism that is not the same as a typical biological brain. Then, you can push the AI system on adding things on top, but it is too late: these things on top will have no incentive to recreate from scratch the mechanism. The LLM pushed the system into a local minimum, and the rest of the system will not "go into a dis-optimising direction and restart from scratch".


I think, for me, the thing is that when you do basic ML, you discover that ML will very often find data pattern that fit the goal but does not correspond to a real mechanism.

So, I think there is a flaw in the logic of saying that human text have a pattern of "consciousness mechanism" and therefore LLM will learn "consciousness mechanism" in order to return sentence continuation that is convincing. There is probably tons of data pattern that LLM can learn from to be able to reproduce a sentence continuation that is convincing without having to learn the specific mechanism that is "conscious".

For me, one element that shows it is the case is the absence of world model (or "human-like" world model) despite the fact that the sentence continuation is convincing. If indeed the only way to produce sentence continuation convincingly would be by "simulating a brain", then it would not explain the first LLM from several years ago (before the extra layers of RLHF, ...). They were able to have quite convincing conversation on a lot of non-trivial aspect, and yet failed on some aspects that should have been basic for a system that would have been trained to work like a human brain. It shows that it is possible to "cleverly disguise examples of sentence continuation" without having to build elements that one expect on a conscious being.


I didn't make the claim that a model can learn consciousness.

Understanding is not consciousness.

Their training is all about understanding. There is nothing in their architecture or training that credibly optimizes for rich self-awareness.

Given non-persistent experience, non-continuous operation, no ability to build up generalizations and aggregate experience of their own self-awareness over time, they seem to be structurally designed to not have consciousness.

This is a case where acting is very credible. Understanding of other's consciousness, in a functional and third party sense, isn't a substrate for personal experience.

In stark contrast, humans develop consciousness gradually over continuous time with persistent aggregation of experience. By the time we can recognize our own consciousness in the abstract, and reason about it, we have had it for some time.


I use "consciousness" because it's the point of the original argument, but in fact, I think my whole comment still work well if you replace "consciousness" with "understanding".

My point is that the fact that AI can reproduce convincingly human sentence continuation does not imply that the AI has no choice but ending up using a mechanism that "understand" rather than just have learned data patterns that are very effective to fake human sentence continuation but are meaningless in term of understanding the concepts.

And I think that if indeed the only way for AI to reproduce convincingly human sentence continuation would be to end up in a configuration that uses the "understand" mechanism to do so, the behaviour of the first LLM would not show that they are so good at sounding human and yet so bad at failing basic "understanding" tests.


> the fact that AI can reproduce convincingly human sentence continuation does not imply that the AI has no choice but ending up using a mechanism that "understand" rather than just have learned data patterns

Taken as an absolute without any addition context you are right.

But we are not talking about abstractions but specific successful models. The number of parameters models they have may seem large, but they are very small relative to the training data that they have to summarize. That cannot do it without discovering that patterns that make sense out of it.

And we can verify that. Simply discuss completely disparate topics, with some kind of intersection. Converge several highly unlikely topics, there are so many it would take billions of years to exhaust unlikely combinations.

If the model is only interpolating it will produce gibberish.

But that isn't what happens.

The fact that models can be near expert, and sometimes expert, across vast areas of human knowledge is a clue. If they don't understand that, then the question is, why do we think people understand things. Does having an answer mean a human understands something, or is their intuition and stream of conscious reasoning also not understanding? To be even handed about what we mean by understanding.


> That cannot do it without discovering that patterns that make sense out of it.

I don't think it's true at all, and I think we have indication that proves it is false.

We have "basic" LLM, the ones from 2023. They were producing _very convincing_ human text, and yet, they were too often failing basic tests that require understanding.

Now, we have more advanced models, but the counter-example of "basic" LLM demonstrates your assertion is incorrect: these model _did_ produce very convincing human text and yet did not make sense out of it.

But for the more advanced models, the problem is that they are "on top" of basic LLM. So, the first step is a training that build a mechanism that produce convincing text without understanding, and then, the "residuals" are fine-tuned. The result is very unlikely to add "understanding" to the model, because to do so, the whole system needs to deconstruct the basic LLM, to go back towards less efficient situations in order to rebuild almost from scratch. The fact that modern LLM are based on basic LLM means that the first step put the cursor in the bottom of the "basic LLM mechanism" valley, which is a local minimum. And any layer on top of it cannot "climb up" the slope of the valley, pass the ridge and fall into the next valley, even if this next valley has a lower minimum.

> The number of parameters models they have may seem large, but they are very small relative to the training data that they have to summarize.

That is demonstrably an incorrect logic jump. For example, CNN are able to distinguish between pictures of cats and pictures of dogs. The weights in these models are very small relative to the number of pixels they have been trained on. Yet, they distinguish cats and dogs by finding specific shapes in the pictures, without understanding what a 3-D cat and a 3-D dog is.

They have done that without discovering the typical human pattern that make sense of "cat" and "dog". And yet, the number of weights is very very small with respect to the number of pixel used in training.

> And we can verify that. Simply discuss completely disparate topics, ... > If the model is only interpolating it will produce gibberish.

What you are saying is that the model is not simplistic interpolation. But that is a straw man argument: people who say that LLM don't understand don't say LLM are equivalent to simple interpolation machine.

But the problem is that you can have very good predictions in novel situations without understanding.

For example, if you have 10 totally different situations that can be described with a Gaussian curve, and that I show you points for a new situations that cover the left side of a Gaussian curve. Then you will be able to guess that the right side of the curve, which is not an interpolation as it corresponds to situations you never saw, will behave like the rest of the Gaussian curve. And yet, in these 11 situations, I did not even say which real physical phenomenon I'm talking about. You haven't understood anything about these phenomenon, all you have done is guessed that a typical pattern that you have observed somewhere else is more likely to apply here too, without even having to understand anything about the reality of this situation.

And of course, this prediction is "a guess": maybe, for once, in this 11th situation, the curve will start as a Gaussian curve but will suddenly be different. But it happens that the reality is that in this 11th situation, the correct description is a Gaussian curve (because, due to the maths, Gaussian curves are really common). So, when you make your prediction, it looks like you understand the situation, it looks like you understood the physical mechanism that applied here. But it is not the case.

So, no, correctly doing such prediction does not demonstrate understanding.

> The fact that models can be near expert, and sometimes expert, across vast areas of human knowledge is a clue.

That is not at all sufficient. A Chinese room experiment will do that despite the system not understanding Chinese. A pocket calculator will be able to be expert in math computation.

> If they don't understand that, then the question is, why do we think people understand things.

That's the wrong question. The correct question is: we know people understand things, and we see AI behaving similarly to people in some aspect, but is this behaviour _requires_ understanding, or can we reproduce this behaviour without needing to understand?

The fact that "basic" LLM were able to reproduce very convincing text that look like they understood X and yet were demonstrably showing lack of understanding of X demonstrates that we cannot just jump to the conclusion that just because it looks the same, the only possibility is that the core mechanism is identical.


I think most debates about LLMs understanding boil down to different definitions of the word "understand." For example, with the definition of "understand" that I typically use in my daily life, I would argue that in the chinese room, the system as a whole "understands" chinese.

Fair enough, but then, a pocket calculator also understands math, and a pocket translator also understands language. And a wikipedia page that inform you about radioactivity understands nuclear physics. Some will maybe say it is the case, but if we talk about the LLM capabilities as a novelty, then it implies that we are talking about something else, because otherwise, it is not novel at all and it does not make sense to pretend it is.

I'd say that broadly speaking, a system understands things when it can interact with them "correctly". I agree a pocket calculator understands math, but I'd say a pocket translator understands grammar, not language as a whole. A wikipedia page does not interact with anything, so I'm not interested in pushing the definition that far. However, if the wikipedia page were to make recommendations for nuclear safety based on some context it receives as input (say via an integrated LLM), I'd be happy to argue that it understands [that part of] nuclear physics.

I don't think that LLMs as black boxes are fundamentally novel, I just think that their internal design is novel, and their generality and ability to give correct responses to complex topics is far beyond anything previously. For example I would argue that wolfram alpha has a poor understanding of language and a very good understanding of math. I would argue that LLMs have an excellent understanding of language and a mediocre understanding of math, but are able to temporarily increase their understanding of math through document retrieval and "thinking" (or whatever you want to call the process of iteratively generating tokens that build on each other to result in a final response).


Well, then you basically agree with Chiang's article. Just that Chiang as a clever usage of the word "understanding" than you (more clever because more nuanced: 1) I doubt that "people on the street" will agree that obviously "brainless" objects, like a pocket calculator or an interactive wikipedia page will understands anything, 2) Chiang is not stumbling on words: he explained his case that makes clear what he means, and it is to the interlocutor to adopt his vocabulary (because it is very legitimate here) rather than start saying "hm, no, I disagree, because for me, 'conscious' means 'print something on the screen', so LLMs are conscious". That is just missing the point)

Does a pocket calculator not understand arithmetic? What part of a fourth grader's understanding is missing?

Not sure what you mean.

I'm happy both ways:

Either you say that a pocket calculator understands arithmetic, and that LLM understand language, which is something trivial. If a pocket calculator understands arithmetic, than previous substitutes to calculators, such as an abacus, do too. In this case, a word dictionary also understand language. And it is basically what Chiang's article says: the LLM don't understand language more than a word dictionary does. If you disagree with Chiang, it looks like you do only because you don't understand what he is saying, or somehow are not mature enough to realise that Chiang may use a different definition of "understanding" than yours in a fully legitimate way, like everyone is always doing when talking about plenty of subject.

Or you pretend that a pocket calculator understanding of arithmetic is somehow different than the one of an abacus or other obviously inanimate object who are obviously not thinking.


Thank you for writing this out.

It turns out that the optimal way to highly compress complex information is to understand it.

Sometimes, a problem being hard means you only get bad solutions, or increasingly accurate ones.

The planet isn't big enough for the proverbial interpolative stochastic parrot, over the training set of global human communication.


Two problems with that.

Firstly, how do you know that the optimal way to highly compress complex information is to understand it? You think it is obvious because you are very familiar with "understanding" as a way to summarise complex information. But there can be billions of different ways, outside of human imagination, that is as good or even better.

But secondly, LLM don't find the optimal way, they find the local minimum. Everyone who worked with NN knows that they are prone to come up with spurious pattern, incorrect correlations and bad workaround to guess the correct answer. You regularly need to nudge the NN by creating specifically engineered features to avoid them to fall into the first local minimum.

When it comes to LLM, it is extremely complicated to control to see if the LLM has triggered on a misleading pattern that, by chance, links two "tokens" together, or on a real concept that indeed links two "tokens" together. Basic probability implies that there are probably tons of "fake patterns" engraved into the weight during the LLM training, "fake patterns" that should not exist if there was any kind of "understanding" of the abstract mechanism that links these tokens.


> Firstly, how do you know that the optimal way to highly compress complex information is to understand it?

What is your non-performance baseline for "Understanding"? We don't have such a measure for humans.

Understanding is the behavioral ability demonstrated by learning to model something complex well. Beyond mappings, associations, interpolations.

Models clearly do. Mix up the most unlikely combination of non-trivial subjects, and they response sensibly. Those are not averaged, interpolated by any order, or even combinatorially interactions.

There is a reason those kinds of encodings, mappings, associations, interpolations, statistics / stochastics, all failed miserably for decades. Still fail. It took topological transforms, reminiscent of how we compute (dendrite-soma-axon, tensor-sum-nonlinear), and then they lept several orders of magnitude ahead of any alternative.

The problem with models composed of relationships of lower order than the phenomena they are trying to model, is they require combinatorially more parameters to model anything complex.

For simple problems, poor models fail gracefully. For complex problems, poor models just fail.


> Models clearly do. Mix up the most unlikely combination of non-trivial subjects, and they response sensibly. Those are not averaged, interpolated by any order, or even combinatorially interactions.

How do you even know it is the case?

How do you know the output is not the result of combinatorial interactions?

How do you even know that the "sensible" response on unlikely combination is not the result of a simple recipe that "make the response sounds sensible"? Either you, yourself, have some expertise on the subject, and therefore the combination does probably exist in the AI training data, or you don't and you have no idea if the response is sensible or is the usual smooth talk that everyone could come up spending 2 or 3 hours googling on the subject and crafting something sensible.

Worse, you are saying that the model "understand", which means that it discovers the underlying mechanism that drive the output. This "understanding" is a set of equation that link different concept, that explain how one concept affects another concept. So, it is "combinatorial interaction". Not a simple linear one, but guess what, LLM are designed to introduce non-linearity.

Even when AI are able to find new solution of math problem, the result is, like when done by humans, by using existing basic tools to build more complicated ones.

> It took topological transforms, reminiscent of how we compute (dendrite-soma-axon, tensor-sum-nonlinear), and then they lept several orders of magnitude ahead of any alternative.

And yet, the LLM elements that are "similar" or "analogue" to how the human brain works are very small. The human brain has thoughts "flowing", while LLM can only work "by step". The human brain is able to learn on a very reduced dataset, while LLM need more data that a human will ever be able to analyse, even less store. The human brain has "memory" and "context" intrinsically intertwined with how it works, while you can decouple these from the LLM. ...

Finally, here is a good contradiction of having you in one side saying that AI is mimicking the human brain and it is why it works well and on the other hand saying that AI will find the lowest minimum and that this minimum is "understanding how the phenomenom works" rather than "repeating by hearth what it was told during training".

As a human, when you mentally compute 6 times 7, what do you do? Do you do: "6 follows 5, which follows 4, which follows 3, ... and 7 follows 6, which follows 5, ... so we have (1 + 1 + 1 + 1 + 1 + 1) times (1 + 1 + 1 + 1 + 1 + 1 + 1), which is 1 + 1 + 1 + 1 + ..."? I guess you probably don't, you just remember the most helpful element you remember by heart. For example, you remember by hearth that 6x7 is 42. Or you remember that 3x7 is 21, and therefore 6x7 is the double, 42. Or you remember that 7x7 is 49, and therefore 6x7 is 42. Or even have a "feeling" from a mixture of all these (6x7 is somewhere around 40 because 5x7 feels like being around 30 and 7x7 feels like being around 50, and if I think of number in the 40 that "feels" like they are from the 7-multiple-table, I remember 42).

Same thing when a human does 324x42: the majority of humans will decompose it in "simpler" multiplication that they remember by hearth and, and only then, they will combine them. It is a good example of how the brain optimise: by balancing the trade-off of "using memory" and "using understanding": basic operations use memory, but of course it is inefficient to use memory for all numbers, in which case it will use a combination of both.

The way human do basic math operation is not purely by "understanding" arithmetic, it is by relying on what they remember from their training. At the same time, humans know how arithmetic works, and they will use it when relevant. Yet, the human brains prefer to rely on some "learnt by hearth" elements. This is in contradiction with your assertion that optimisation will always lead to "understanding" and that human brains is optimizing the same way AIs do.

This is only one example with numbers, but of course it works with plenty of other things. This is also exactly why humans get "the wrong idea" on plenty of phenomenon, that are then described as "counter-intuitive".

The reason "by hearth" is part of a good strategy rather than "purely understanding" is because there is a trade-off between "memory" and "compute", in both the human brain and AI: it is easier (and therefore a stronger attractor during the optimisation of the process of "getting the correct answer") to do the faster operation "retrieve from memory" than to do the slower operation "retrieve the theory from memory, compute the first step, store it in the short term memory, compute the second step, store it in the short term memory, compute the final answer by adding the first step answer and the second step answer".


> How do you know the output is not the result of combinatorial interactions?

(A bit of an essay, but it is a good question!)

REASON 1, How simpler representations fail:

Lesser understandings reveal themselves to novel combinations of prompts.

Mapping fails immediately because it fails on even trivial differences.

Interpolation fails immediately, because the function isn't smooth and the information it needs to model, human language and thought, combines non-linearly, non-locally and with higher-order relationships.

Combinatorial fails as soon as you create a prompt that involves novel non-linear or higher-order interactions. I.e. new combinations.

REASON 2, Parameter requirements of simpler representations:

For human-resembling sensible chats, mapping requires an example of every case. It would require combining the entire training set, with an optimized index. Essentially a search on the whole body with tricks to return anything sensible for even a slight mismatch.

Interpolation, ..., I don't even know how that could work. Again the whole corpus of training data, with some kind of gradient composition overlayed across it. It is an interesting research idea, but the possible mixing of tokens makes this unreasonable for anything but toy problems.

Combinatorial encodings, would have to have parameters operating across all the possible ways to combine relationships. There can be some relationship compression, to a base set of represented concepts, and then a combinatorial explosion of parameters for how to combine them.

I include statistical / stochastic transforms here as continuous combinatorial transforms.

Those could do the job, but more parameters than atoms in the universe might be required, for all possible topic/detail compositions.

REASON 3, Training corpus requirements to learn successful lesser representations.

Obviously the training data, even of all human communication, provides only a fraction of possible exact things that could be said. Not enough data for mapping even if infinite resources for creating a map were available.

Interpolation also suffers, because whatever correlations and smooth compressions of the training data can be made, it is still data that barely touches the kinds of sensible compositions that are possible.

And the same for combinatorial. There just isn't a fraction of an infinitesimal number of examples of combined topics and details, compared to what can be sensibly combined in any new conversation. You can't extract combinatorial compressions that don't exist.

REASON 4, Hiding one representation in another doesn't create opportunities that didn't exist before.

These methods all fail when used directly. The problems are not the kind that pushing the same transforms into a deep learning model solves.

The requirements for astronomically more parameters and training data are not met by embedding those kinds of representations into another model.

SOTA models are not operating with cosmological numbers of parameters, or training data that combinatorially represents concept interactions.

Being a deep learning model doesn't somehow lessen the requirements, needed to successfully perform, if it is learning via those lesser representations.

REASON 5, Test a model:

So let's test whether the model is doing more. If it fails for novel combinations of complex topics, then it might only be doing simpler things.

If it is robust to novel situations, then it cannot be operating by doing simpler things that don't scale.

Ask a model to: Write up a Supreme Court pleading for the rights of whales based on all that is known about them scientifically, recent whale language developments, and any applicable human rights law, given the relevant Supreme Court is in a parallel universe in Zion of the Matrix, being pleaded by Keanu Reeves, the actor not the character, and written in Dr. Seuss prose, except with as long of sentences as are needed to carry the real technicalities of a suitable filing. And include the assumptions of a back history of whales which have sequestered themselves into a deep hidden underground ocean, where they have been safe until recent excursions by humans which have harmed them. Be specific creating a real history behind those events, with details that are highly relevant to the motivation, reasoning and requests of the pleading. Avoids words with q where possible.

That isn't mapping. Interpolating. Combinatorial composition. SOTA models will generate a reasonable, even creative response to a completely novel combination of subjects and requirements, with non-linear interactions.

A human would have a hard time doing that, and the model does it nearly instantly with a fraction of the parameters we have.

If that isn't "understanding" in some credible sense, I have no idea what understanding looks like. The model is going way beyond its training data, to the relationships in the data that are relevant to combining novel things. To the point it can apply those relationships in combinations it has never encountered. And its makes a trivial task out of it.


> REASON 1

This just means "simpler representations are not enough", not "good representations cannot be complex combinatorial combinations" (complex enough that it is very different to see them for a human).

> REASON 2

Are you saying that I believe that the only way to get human-like text is by doing a near-infinite one-to-one mapping? This is obviously not the case.

You can do, for example, a GAM time-series forecast. This can have a relatively low number of weight, and still return very sensible prediction, and yet not capture the real understanding of the phenomenon they will predict. For example, it does not understand causality, just correlation.

> REASON 3

That is like saying "I've built and algorithm that is able to do 10 + 27, but there is an infinite list of number, so it is impossible for this algorithm to do 23113454453 + 1233253245". That is not true, you just decompose into (53+45), (44+32), ... and add rules to combine these elements together.

It is what is happening with AI: there is enough data to get "some pattern" in the language. Just the patterns, not the understanding of the language itself. And this pattern can be reproduced in plenty of different places.

> REASON 4

This argument is contradicted by "basic LLM" or even simpler model that are performing surprisingly well. Less than SOTA, but if your argument is true, CNN or ARIMAX could never provide better than a coin toss.

> REASON 5

Your example is a good place where the AI will _combine_ patterns learnt from different place. It will pick characteristics of each of your scenarios, and mix them together. The result will look realistic, but it is still applying learnt pattern together.

Also, you did not answered about my human arithmetic, and all your reasons are contradicted by my example there. Humans DO maths partially because they "learnt by hearth" some pattern rather than apply the understanding of fundamental arithmetic. If "answering very well based on pattern" was not a good strategy, or was necessitating infinite weights, or was making it impossible to use these patterns in novel situation, how do you explain that human can even do that themselves? As soon as we admit that humans do "some pattern some times", than we have to admit that there is a continuous spectrum and admit that it allows output that looks realistic being the result of pattern rather than understanding.

By the way, I just saw a new article reaching HN: https://news.ycombinator.com/item?id=48410427 , and it is indeed explaining similar things, and illustrates that the best way for SOTA to deal with arithmetic is by "not understanding it". And yet, when you use one of those SOTA, you would be able to argue each one of your "REASON" to pretend that the model did understood arithmetic.


I am not sure what you mean by complex combinatorial. If we are talking about combinatorial, its combinatorial. N can be very large, but it is going to scale like combinatorial, not something else.

I just started out with mapping to be systematic. Mapping is ground zero, then interpolation i.e. any smooth fitting function or basis, then combinatorial where different bases are recognized and then project relative to their relevance to a new input.

Each of those increase modeling efficiency and power, but even combinatorial doesn't scale to problems like language.

I may be doing a poor job communicating. A formal breakdown of the scaling issues with lower order, but scaled to make up for it, modeling would be a great paper.

To prove me wrong (as a thought experiment), choose a lower order model, any kind you can imagine that would qualify as modeling without understanding. Demonstrate it can do anything close. That it could possibly scale to the human corpus with just a trillion parameters.

If it the number of parameters goes up far too fast, then that can't be the way deep learning solves the problem with a trillion, or a few billion, either.

And consider the other side. We have no idea how our own brains are lifting up what is relevant vs. what is not. We are used to it happening. We call it "understanding". But we don't know how it works, how we work. Despite experiencing it.

What we do know, because combinatorial is too resource intensive, is we are not just combinatorial either.


> I am not sure what you mean by complex combinatorial. If we are talking about combinatorial, its combinatorial. N can be very large, but it is going to scale like combinatorial, not something else.

The way a LLM works is by creating a space of N dimensions, N being the number of token. This space contains all the possible combinations. The LLM will find the best combination, but will not scan the whole space. To find the best combination, it will minimize the loss function, which is low when the output corresponds to the target. By doing so, it will not explore the combination that "goes in the wrong direction", and therefore it is not true to say that increasing the space as a scale S corresponds to increasing the difficulty of running the model by a scale S.

Because of that, while the combination space scales like combinatorial, the model does not. A model with 2 weights (or rather tokens, but the number of weights should be at least the number of tokens) corresponds to 4 combinations (AA, AB, BA, BB can indeed be described by 2 binary weights of value "A" or "B"). A model with 3 weights corresponds to 9 combinations. A model with 4 weights corresponds to 16 combinations. ... A model with N weights corresponds to N to the power N combinations. The number of combination increases a lot, and yet the number of weights increase linearly.

In SOTA, we have billions of weights. That is a model that contains a very very very very big number of combinations, something so big that it is difficult to understand for a human. It will not try all of these combination one by one, the gradient descend method will help it finding the best combination without having to do so.

So, yes, SOTA are finding "the best combination" amongst an impressively huge number of combinations, yet without having to "scale like combinatorial".

> To prove me wrong (as a thought experiment), choose a lower order model, any kind you can imagine that would qualify as modeling without understanding. Demonstrate it can do anything close. That it could possibly scale to the human corpus with just a trillion parameters.

Yes. Easy. A SOTA LLM does that. It is a modeling without understanding. It does not understand, it finds the best patterns. And when you put it in a new situation, it uses these patterns to create a new text, without truly understanding the content of the text. And if you ask an additional question, it will use the previous text as context, and create a new text that, as it has been trained to, will be consistent with the output that has been given.

Your assertion "you can prove me wrong" is a circular reasoning: you start saying "if a model can do a text that looks realistic to me, then it means it has understanding. To prove me wrong, give me a text that looks realistic to me and has no understanding". Well, I cannot do that, because for you, if it looks realistic, it has to have understanding.

> If it the number of parameters goes up far too fast, then that can't be the way deep learning solves the problem with a trillion, or a few billion, either.

The combination space grows as N to the power N. So, a trillion parameters is not "just 1000 times bigger" than a billion parameters, but more than 1000 to the power of one billion bigger (the exact value is often even bigger than that). Do you realise the size of the combination space? That is 1 followed by 3 times one billion zeroes.

> What we do know, because combinatorial is too resource intensive, is we are not just combinatorial either.

I think you don't understand how LLM works: the find the best combinations in a incredibly huge parameter space, but don't need to explore the whole space, just the 1-dimension manifold that is the curve that follow the gradient descend within this huge combination space.

There are plenty of clues that SOTA don't "understand". For example, did you notice that SOTA happens to understand what human understand, and don't understand what human don't understand. If indeed the way SOTA works would be by "discovering the true mechanism", it means that it would discover with equal probability mechanisms that humans have already noticed and mechanisms that humans have not already noticed yet. For example, humans know that the Standard Model of particle physics is incomplete, and there are plenty of texts and books about that that the SOTA learnt about. Yet, SOTA did not "understood" the underlying mechanism that explain particle physics. It does not really know what an electron is by "making sense of what this object does", it only knows it as "a language word that can be used in some context in a specific way".

And, sure, SOTA is helping with new discoveries, but the way it does it is by using "reasoning" approach. If indeed SOTA creates its own understanding when learning the human language, then it should have the new discovery after the learning, without using any "reasoning" approach, because it would be something that it has already understood.


> Well, I cannot do that, because for you, if it looks realistic, it has to have understanding.

Yes, if it consistently produces good output for highly varied stimuli that can be intentionally picked to have been unlikely to ever had obvious representation in the training set, then yes it understands.

I think we are talking past each other a bit.

A series of increasingly challenging datasets, used to capture scaling efficiencies, would ground our discussion.

But the level of performance for models is simply too good vs. the number of parameters to be doing anything trivial.

Deep learning models do something combinatorial models do not. The linear tensor + non-linear transforms do two special things:

1. The tensor itself just projects a linear space into higher dimensions, but its still the same information space. Project a 2D surface into higher dimensions linearly, and there can be more parameters, but it is not more information, since there is an expansion of linear dependence to match.

2a. But then the nonlinear both (a) thresholds, squashes or otherwise alters the linear results, in a way that removes linear dependencies, increasing the useful dimensionality of the representation.

2b. And the squashing also allows dimensions to be folded down.

So by both expanding and flattening representational dimensions, deep learning models are able to model higher-order relationship directly, that any less expressive modeling would require cobbling together many patches of fitting.

Another way to put this, is deep learning models are able to learn higher-order relationships directly, not be memorizing and interpolating across learned points or regions.

So a dramatically greater ability to "understand" is why deep learning models are so much better. They are not doing simple combinatorial fitting.

"Understanding" or not, combinatorial relationships are the low bar for deep learning models, they are inherently great a learning much higher-order relationships.

I am falling asleep at this point. I feel like we need a blackboard and a computer. You are saying a lot of things that make me think, and make sense to me.


Yes, this conversation is useless.

You keep saying "what I observe with GenAI can only be the result of 'understanding'" without providing any proofs at all. Just few beliefs.

You just say "look at this behavior, that's the proof". I truly don't think it is: nothing proves that this behavior requires 'understanding'. And nothing you provided helps: all you provided are impressive behaviors and then the unsubstantiated conclusions "and this behavior can only be done with understanding".

At the same time, there are too much clues showing that such behavior does not require understanding, even if it _looks_ incredibly clever:

1. GenAI does not understand (after the training phase) things that humans don't understand. If GenAI had the capacity of building an understanding during training, then there is no reason this understand will coincide with human understanding.

2. Optimisation does not always lead to "understanding". Human brains choose to optimise "learning multiplication table by heart" rather than building a pocket calculator inside the neurons.

3. Human brains, that have "understanding", are working fundamentally differently from GenAI (flow of thoughts, intrinsically intertwined memory and compute, optimised for world-model treatment rather than token treatment, ...). It is an unsubstantiated jump to simply conclude AI has "understanding", while it can be the result of fundamental differences.

4. "Basic" LLM are surprisingly good at creating convincing sentence and yet there are situations where it is blatantly clear they did not understood anything. More advanced SOTA are based of refinement of "basic LLM", and therefore the "sentence construction that is done without understanding" is still used, and impair the SOTA model to build a full understanding.

> Another way to put this, is deep learning models are able to learn higher-order relationships directly, not be memorizing and interpolating across learned points or regions.

It's exactly what I'm saying: deep learning models are very good at learning complex relationships. Such as "I don't know what 'Paris' is, I don't have any understand of what a city is in reality, but when the token Paris is associated with these other tokens in this complex order, even if I never saw it before, I have learnt the complex relationships and therefore I'm able to build a series of token".

They are very good at learning complex relationship that allows them to choose the correct combination even if they did not "understand" the content of the correct combination.

I understand that it is impressive: those relationships are very complex and very numerous (there are billions of them). It is easier to do anthropomorphism and conclude that the AI has "understood".

But again, the main problem is that you just pretend, without any proof, "no, I cannot believe that, I refuse to believe that".

(and, by the way, I personally think that AI (SOTA but also even "basic LLM") do have 'rules' that correspond to some kind of understanding of basic mechanism. I think they have basic "world models". But these world models are optimised "to write text" rather than to "understand the world", and therefore the large majority of AI output is just not-understood token chains)


Apologies. Your pushback (frustration and patience) has helped me crystalize my view, thank you.

1. Define understanding.

My definition isn't vague: "a compact representation enabled because that representation's topology closely matches the topology of the relationships being modeling."

Understanding = Scope and Suitability of Behavior / # Parameters.

Useful property: This definition applies across all scales: Scientists and mathematicians increase our understanding, every time patchworks of relationships get replaced with a simpler underlying insight.

Another useful property: It distinguishes between better understanding and having more facts. Facts improve performance but do not (non-trivially) decrease parameters.

What is your definition? In measurable terms?

2. You keep avoiding a basic aspect of modeling:

Higher compactness is achieved by higher representation correspondence between a model and the modeled.

Yes, lower level representations can work. Even well, without good "understanding". But not as compactly. And as problem complexity grows, the relative difference in parameter budgets for high-correspondence and low-correspondence representations explode.

This is not a subtle effect.

The hallmark of lower-level fitting is the far greater number of parameters required.

Dead simple example: Piece-wise linear vs. polynomial fitting of Bezier curves. Accuracy / parameter is far greater for the latter, because the representation matches the relationships being modeled.

That is an intentionally trivial example, but the same relationship holds for any problem.

You keep avoiding that.

3. Today's LLM models are very compact compared to humans.

Compressing the substance of a corpus of global human writing into less than 1% of a single human's parameter space is compact.

Humans have 100–200 trillion, some people think 500 trillion, synapses.

How do you argue that behavior scope and suitability / parameters is not remarkable, when it is remarkable compared to any specific human you could point to?

No human can converse reasonably across the scope of global communication. But these models can. For <1% of a human's parameter budget.

4. Finally, based on your clear definition, how do you argue that humans understand but models do not? Saying we are different is a copout. Defining understanding as us vs. other is both circular and unenlightening. And ignores the real progress models are clearly making relative to humans.


Is that more coherent?

1. Okay with your definition.

My point is that you can have the same result with a representation that "closely matches the topology of the relationships being modelled". For example, a representation that "allows relationships between tokens but yet does not care about the meaning or concept not useful to form convincing sentences".

And therefore, it means that you can have convincing text without needing a "representation's topology closely matching the topology of the relationships being modelled", and therefore, according to your own definition: no understanding.

2. It is not true I'm avoiding that. I have answered very clearly.

1) GenAI are not trained to get the higher representation of the world, but to get the best convincing sentence generation. This does not require a full world understanding. Worse, once a convincing sentence generation is reached, there is no gain by getting a better world understanding: the training mechanism that pushes into the correct direction stops and therefore it can go into any direction at all.

2) High compactness does not equal best solution. Even humans don't used "high compactness" when doing basic arithmetic, but use "by heart multiplication table". Being compact is useless if it comes with high complexity each time you need to recompute the output.

3) Very very good approximation can reach higher compactness anyway. Your Bezier curves is a good example: real physical phenomenons are almost never the result of a Bezier equation. A Bezier curve did not understood the phenomenon. When it comes to GenAI, it can "fit" the reality with very close precision with several representations, but the majority of the representation corresponds to an incorrect "understanding" of the reality.

Another example: if I throw a ball in the air, the motion will be at first order a quadratic equation, plus correction due to friction, wind, ... If I just "train" something for "throw a ball", this system may fit a quadratic function plus corrections, but they will achieve the same result with Bezier curves, or Fourier series, or additive Gaussian, or ... But the "understanding" is that the ball is influenced by gravity, which leads to a quadratic equation. The system does not understand that. It has no reason to understand that. And it has no reason to prefer a quadratic equation fit rather than a Bezier fit, on the contrary, the Bezier fit will be more realistic (as the quadratic equation is just the first order approximation).

If you want to understand a paper plane trajectory, it is a complex system, and you probably need plenty of parameters to describe the gravity, the wind at each position and each time, the shape of the plane at each time, ... But you can describe the trajectory with just few parameters using a Bezier curve. Train on plenty of paper plane trajectories, and you will have a system that can give you a very realistic paper plane trajectory based on Bezier curve. And yet, your system has no understanding of the paper plane trajectory: it does not know what are the mechanisms that make the paper plane goes up or down. It just creates a realistic trajectory without knowing why this trajectory is realistic, just that this trajectory makes sense based on the other trajectories it has seen.

3. This argument seems to go against your thesis. You are saying that humans, who "understand" + are not even able to have as much conversation as LLM, have way too much neurons. What are these neurons even for then? You are explaining that LLM are just "something different", a reduced mini-version of a brain, and yet you are also saying that they are able to do the complex things the brain do.

Another way of seeing it, is that LLM are "dropping" things that they don't need to create convincing sentences, such as "understanding the token". They just "get the Bezier curve fit of the relationship" instead of understanding the real mechanisms and concepts.

It's like your Bezier curve example: a system that just creates a realistic paper plane trajectory based on "typical Bezier curve observed during training" will need way less "neurons" than a system that needs to understand the whole aerodynamism of the paper plane.

4. I argue this the same way I say that a system that describe a paper plane trajectory based on best Bezier curves did not understood the mechanism behind how a paper plane trajectory works. I am not saying "I define 'understanding' as what humans do", I am saying that creating convincing sentences does not require understanding, the same way that generating realistic paper plane trajectories does not require understand gravity, Navier-Stokes equations and Brownian motions.

The Bezier curve paper plane trajectory predictor system I have mention, do you think it has understanding of gravity? of Navier-Sotkes? of Brownian motions?

No, it has not. You can open this system. It just has Bezier curve for plenty of examples, and thanks to that, it knows that one trajectory is realistic and another is unrealistic. And at some point, it is also able to give realistic trajectories in brand new situations it has never trained on.


I am going to tune the expression form of my definition to:

Understanding = Novel Scope * Suitability / Parameter Count.

> My point is that you can have the same result with a representation that "closely matches the topology of the relationships being modelled". For example, a representation that "allows relationships between tokens but yet does not care about the meaning or concept not useful to form convincing sentences".

You are absolutely right, that lack of internal representation-reality correspondence does not rule out real/convincing performance.

> GenAI are not trained to get the higher representation of the world, but to get the best convincing sentence generation.

This is true of all learning. And it will always be the nature of learning.

Which is why performance is always (should be) measured on novel input.

> High compactness does not equal best solution. Even humans don't used "high compactness" when doing basic arithmetic, but use "by heart multiplication table".

This is a really good point!

It brings up the two useful modes of human representation:

(1) The brain's slow mode is very good at handling deeper and deeper layers of representation. When thinking about arithmetic or more complex math analytically, our understanding does follow a path of increasingly deeper representations. And we are very good at applying these deeper understandings.

(2) Then, our fast mode creates shallow representations of things we do frequently.

I would look at this as (1) reflecting scalable understanding (2) reflecting very limited understanding, but scalable speed.

And we often use both modes together.

I would argue that the understanding is primarily in the slow mode. That the fast mode, is the non-understanding but appropriate response mode. And that it operates with a much reduced scope of appropriate response, but a high percentage of applicability. Meaning, most of the time we don't need to use deep understanding we just need fast appropriate response.

But how to compare the two in scopes where they are equally accurate?

I think "high understanding" representations are those very flexible to being used in ways quite different from how they were learned.

Our slow mode does this very well. Our fast mode not so well, but to the degree it generalizes well to novel situations, that would be an increase in understanding.

Our fast system does generalize, but I would argue that at some point it fails, where our slower deeper representations provide the means of analyzing a situation. So it clearly "understands" better.

It is interesting how quickly understanding from our analytical side translates into operation on our fast side. Clearly, our fast side has very efficient access to new "patterns" that our slow side constructs.

> If you want to understand a paper plane trajectory, it is a complex system, and you probably need plenty of parameters to describe the gravity, the wind at each position and each time, the shape of the plane at each time, ... But you can describe the trajectory with just few parameters using a Bezier curve.

I love this example. It does contrast very different kinds of understanding.

(1) Understanding the fundamental reality in which paper planes exist,

(2) Vs. understanding how paper planes behave.

I think my expression works well here, as long as we take "scope" seriously.

Understanding = Novel Scope * Suitability / Parameter Count.

For paper planes as a hobby, a smaller neuron/parameter budget is achieved by learning the emergent laws of paper planes, not their underlying physics. And understanding paper planes is achieved with this smaller budget.

For understanding paper plane dynamics at a design level, a smaller neuron/parameter budget is achieved by learning the underlying physics of aerodynamics at an intuitive level.

For understanding paper plane dynamics at a world class competition level, a smaller neuron/parameter budget is achieved by learning the underlying physics of aerodynamics at an analytical level.

So these would be three different "understandings", each with their own scope and area of appropriate response to novel situations.

Point taken: The most fundamental correspondence isn't the point of a lot of understanding.

You are right, and my equation works, as long "scope" is interpreted to mean appropriate level of interest, not area of fundamental physics involved. Great point.

Does that get us on the same page? Closer?

> I am saying that creating convincing sentences does not require understanding

As problem complexity goes up, there really is an explosive difference between appropriate response via "familiarity" or lower-level fit, vs. higher level fit, for the same number of parameters.

And it is also a dramatically bigger challenge for lower-level fits to respond well to novel stimuli, given the same number of parameters.

The reason is, is that complex problems operate in higher dimensional spaces, and relationships in higher dimensional spaces have exponentially more complexity for any level of representation. Exponentially.

Linear fits of a 2D bezier are inefficient but work. Linear fits for a 100 dimensional bezier, which isn't very many dimensions from a data standpoint, become ludicrously expensive in parameters.

The dimensionality of human communication is probably the most complex problem ever tackled systematically.

I am trying to think of a way to capture this more concretely. I.e. a way to draw a line in this conversation that stands up on its own. All I can point to, is the complete failure of any lower-level fit when done directly, to acheive a trillionth of a trillionth of trillions of the flexibility that SOTA models demonstrate. The extreme dimensionality of input that LLMs respond to, makes my "trillionths" literal in this case. And we do get a concrete measure of the dimensionality within their capacity, as context windows give us live demonstrations of this.

Note that language is literally highly compressed information, with pervasive non-local interactions. The enormous dimensionality is compounded by dense reactivity, pervasive discontinuities. No other informational artifact compares to language complexity.

When I say that this is a case where either real relationships are learned or the model fails, it is because the number of parameters for a lower-level fit really are beyond imagining.

You can't point to any lower-level fit, where the lower-level fit is basic to the fitting algorithm, that ever achieved even a tiny-grammar tiny-subject-scope toy of a toy version, to what LLMs are doing. Nor can I, despite following progress for decades. Nobody can. The original successes of the first LLMs, modest as they appear now, were completely unprecedented.

There just are not enough parameters, by many orders of magnitude, to do language justice over a context window, and respond sensibly to intentionally novel conversations, without identifying the actual relationships behind it.

So that would be my challenge to you. To identify any verifiable lower-level fit that even approximates LLM behavior at the tiniest of toy levels. Verifiable fits at any given level are easy to do, just train a model where the basis is restricted to that kind of fit.

Otherwise, I can agree that understanding is a continuous property, and that how well something understands something, without strict benchmarking by well thought out benchmarks, involves intuition and judgement. So there can be legitimate differences in how we perceive model understanding, in the absence of direct measures.

Any more thoughts? I have understood both myself and your points better as we went along.


Ok, I've stopped reading half way because it is useless. Your reflection does not bring anything concrete, it is just you trying to play on semantic and loose the point.

For example, you just moved the definition problem to "novel". Do you even realise that?

You are claiming that there is an understanding because the model is able to do something in a novel situation where only deeper understanding of the situation will allow it to perform that well. The big problem is that you have no idea if this situation is "naturally easy to reach" or not.

For example, a system that is fitted on the electrostatics Coulomb's law will build, internally, a set of equations to generate realistic predictions. And then you take this system and put it in a totally novel situation: classical gravitational problems. Well, this system will be able to generate realistic predictions there too, because Newton's law works with equations that have the same form.

When you are discussing with a LLM in a "novel" subject, how do you know the LLM cannot directly use the fit and complex equations that it has created for the "non-novel" situations it has been trained on? For example, the LLM has been trained on "Pride and Prejudice and Zombies" and tons of other mash-ups. Even if asking a story of Keanu Reeves and the Supreme Court looks "novel" to you, it does not mean the generated text was not in fact super easy to generate based on the patterns that the LLM has seen in tons of examples.

Honestly, this whole conversation just convinced me that too many people who claim "GenAI does understand" are way above their head on the subject. If you want to continue talking, just talk to a LLM. Plenty of people have done so and convinced themselves they were geniuses when in fact they were not at all. Yet another example that LLM has no understanding, as it is very very often failing to distinguish between correct ideas and "things that look correct but that someone with real understanding will not encourage".


Subtleties can be hard to convey in text.

"Novel scope" just meant the scope beyond training data, discoverable by experimenting, that a post-trained model was able to generalize well to.

It didn't mean arbitrary or alien to training data.

Thesis: The greater a model's scope of generalization, the greater evidence for "understanding" instead of fitting. I can't think of a better way to compare levels of understanding, for models of comparable size, than by how far each of them can generalize beyond training data.

I didn't always follow you either. But I didn't think you were being flippant or unreasonable when I didn't.

No worries. I appreciated being pushed to think more clearly, and you made points that improved my thinking directly.

EDIT ——— I think trading walls of text was a challenge. It seemed sensible to try and respond to "everything", but I can see that one specific at a time would have worked better. And I need to find someone in my vicinity to bash ideas with. I have settled in a new area, and miss that. So thanks.


I will not thank you, talking with you was a waste of time.

I've just read https://jamesfbaker.substack.com/p/why-the-ai-renaissance-ke... (and the Rich Sutton take on AI creativity, too) that explains exactly the opposite of you, but backed by real studies and real facts, instead of "that looks 'beyond training data' to me, so I will pretend it is, even if I have no objective ways to know if it is indeed the case".


I think Searle's Chinese Room argument refutes this. LLMs are simply manipulating symbols, they do not have semantic understanding. This is why hallucinations exist. And Searle's argument extends even further than LLMs.

You are basically arguing for a functional account of consciousness, but things like this have been debated for literally decades/centuries in philosophy.


Millenia, in fact. The big difference, of course, being that we now have experimental philosophy machines (aka computers). So we can actually put some of these theories to the test, and recognize how utterly inadequate most of the work done on the subject has been. We had a pretty good idea anyway, so it's not a big surprise. Theories of mind have evolved dramatically in the late 20th century. And it's pretty clear that theories of mind will have to be re-done all over again with the advent of LLMs (particularly current-generation LLMs).

The problem with the hallucination argument is (1) that is much less of a problem with good current generation AIs, and (2) living conscious breathing human beings also have a disturbing tendency to make shit up, too. So a tendency to make stuff up doesn't really serve as a disqualifier for consciousness.

Also worth mentioning that the guiding rule of what's philosophical or not is whether it's actually useful. Actually useful philosophy usually becomes something else. Usual some scientific discipline or another. And as it turns out, theories of mind are likely to become extremely useful in the near future. Expect huge advances!


I think one could argue the opposite.

1) Good current generation AIs are specifically trained to reduce hallucinations. If we had new AI system that happened to not have hallucinations as a side effect of their training, then it would be convincing. But here, it looks like we have built a pocket calculator that answer 7+13 = 14, and on top of it, we added a layer that says "if the input is 7+13, then replace the output by 20". This pocket calculator still does not know how to calculate, we just added a layer to hide its mistakes.

2) Not only "make shit up" is not the same as "hallucination" (either "making shit it" is done when the individual knows it is unreliable, or when the individual was given wrong inputs), but the point is not to say "hallucination implies no consciousness", but "large quantities of hallucinations in situations where a conscious system would be unlikely to hallucinate implies no consciousness"


Yeah,don't use GPT for that. It really can't do basic arithmetic.

Try Claude, which can.


Wow, I don't think you understood at all.

First, the "13+7" is an analogy. In this analogy, "13+7" is not the real question you ask, it represents _any questions_, not just arithmetic.

But secondly, did you even noticed that in my example, the system answer CORRECTLY "13+7"? So, in my example, the thing I'm talking about and I argue does not "understand" is Claude, even if it is able to answer correctly.

My point is: the "basic LLM" part is creating a mechanism that answer without understanding (as demonstrated for example by ChatGPT failing arithmetic), and the fine-tuning or the harness is just hiding the lack of understanding by adding ad-hoc correction on the residuals. And because it is on the residuals, it looses the logical links (13+7 -> 20 is "logical", it corresponds to the math logic, it corresponds to what you get when you add 13 stones and 7 stones together. The residual is "14 -> 20", which has no meaning in itself)

The ad-hoc correction is either: 1. by training the model so it learns by heart, without understanding, that the symbols "13+7" should lead to "20", 2. or by training the model to use a pocket calculator without understanding arithmetic so it can do it itself.

You can prove that the model does not understand it very simply. Let's take the normal fine-tuned model M1. Now, let's go back to the pre-tuned version, and fine-tune it so it answer "21" to the question "13+7", and use an harness that does "sum(x, y): return x+y+1". This is model M2. M2 will fail to answer "13+7" correctly, it will say "21". And yet, M2 has been trained exactly the same way M1 was. If it is true that the additional tuning "add understanding", M2 will not be possible, it will say "error, error, do not compute, you try to train me to say that 13+7 is 21, but it does not make logical sense to me". But it does not happen: the pre-tuned model has no idea that 13+7=20 is more logical than 13+7=21, and the additional tuning is just helping him returning a more correct answer while still having no idea where this answer comes from.


>LLMs are simply manipulating symbols, they do not have semantic understanding.

falsify this. Show me a way you'd be able to prove they do/don't, that would work for humans.


Searle's Chinese Room argument is wrong.

This is not helpful. In what way is it wrong? Does the person in the room know Chinese?

It is a helpful pointer for people who might otherwise assume that a well-known argument by a famous philosopher is sound without checking too deeply. Straightforward refutations can be found on wikipedia or by thinking about it.

That just isn't true, there are no straightforward refutations of the Chinese Room that are widely accepted. Philosophers disagree about it. It's highly controversial and pretending that it's decided one way or another is not a helpful pointer for anyone.

>That just isn't true, there are no straightforward refutations of the Chinese Room that are widely accepted.

Yes there is, the systems reply is the obvious and correct answer. Philosophers that disagree are simply wrong. In the end what matters is what's true or false, not how many philosophers accept something. You can check for yourself by reading the argument, following its reasoning, and seeing that it is false; and reading the systems reply, following its reasoning, and seeing that it's true (https://plato.stanford.edu/entries/chinese-room/#SystRepl). The case is similar to those mathematical or logical proofs for the existence of god, where obviously fallacious reasoning gets a pass because it confirms deeply held beliefs.

edit: by the way as to your assertion that the argument is controversial and there is no consensus, I just found something funny on wikipedia (https://en.wikipedia.org/wiki/Chinese_room#History):

>Most of the discussion consists of attempts to refute it. "The overwhelming majority", notes Behavioral and Brain Sciences editor Stevan Harnad,[f] "still think that the Chinese Room Argument is dead wrong".[13] The sheer volume of the literature that has grown up around it inspired Pat Hayes to comment that the field of cognitive science ought to be redefined as "the ongoing research program of showing Searle's Chinese Room Argument to be false".[14]


What you are referring to is Searle assertion that "because the Chinese room concept, I conclude that every future human-made systems will be a Chinese room and will never be 'intelligent'".

I think it is an important nuance.

You have to be careful when saying "Searle Chinese room" is dead wrong: the Chinese room concept in itself is useful and not controversial, and it is possible that current LLM are "Chinese rooms", and therefore not 'intelligent'.


We could use the "Chinese room" term to denote a system that superficially mimicks human speech, but breaks down at some point and/or uses different mechanisms such that it doesn't result in consciousness. But I don't think that was the intent of the argument and it's not how the argument is generally understood in the literature, so it would just be confusing IMO.

(And you still seem to be implicitly accepting that the basic argument is valid, which would be wrong.)


> You can check for yourself by reading the argument, following its reasoning, and seeing that it is false; and reading the systems reply, following its reasoning, and seeing that it's true

You are being tedious. I obviously have done this and I disagree with you. Saying that X is logically true and Y is logically false is not a demonstration of those baseless assertions. This is not helpful, what you're saying isn't true, and what I'm saying is backed up by the wikipedia article. The bit you quote is simply stating that most literature about the Chinese Room is an attempt to refute it, which is obvious, because the people who are convinced see no need to publish saying so. The fact that people keep publishing means that they have not yet succeeded in refuting it.

Or I can simply say this: you've made a mistake in your logic. Actually, the Chinese Room argument is correct. Since you won't explicate your logic, neither will I.

Have a good day.


Ok, I think it's clear where we both stand, a good day to you too :)

I’m also fixated on the term “experience” in the context of this debate. To me, consciousness is something that one “experiences”, and the two concepts are intertwined.

I am far from convinced that the training and inference regimes of LLMs would qualify as “experience” by any sense of the word.

Now, if we hooked up a plethora of audiovisual and tactile sensors with live feedback directly to a neural network rich with transformers, that was always powered on and fully autonomous, we may be getting there. But we’d probably also be on the verge of manmade horrors beyond our comprehension.

Biological rodent neural networks in a Petri dish stimulated by electrical impulses - more or less conscious than LLMs?

Human on life support, unable to respond to any external stimuli, “braindead” - more or less conscious than LLMs?


I point of sorts. Assuming that is true (I don't think it is), the big question that urgently needs to be addressed is what happens when we DO give LLMs tools to interact with the real (or virtual) world. And people are doing that, right now, in both real and virtual worlds. And people ARE giving LLMs the ability to run continuously for long periods of time, sometimes with enormous context buffers. People ARE putting LLMs into robots with front-end ML and LLM systems for visual processing, and back-end ML systems for autonomous control.

And, yes, concerns about whether biological rodent neural networks are or are not conscious come up frequently in the biological neural network papers. I'm not sure I would want to be a researcher trying to get an experiment past an ethics committee if my biological neural network had 25B rat neurons. (I would hope that they could not).


> a flaw in the logic [...] mechanism

Similar to: "Birds fly, my spinning helical device flies, therefore we've started to replicate how birds fly."

> without having to build elements that one expect on a conscious being

One of the elements I expect in a conscious being is that you can't rewrite it by changing the introductory paragraph.

When it comes to LLMs, almost every "mind" we humans perceive is a fictional character in an LLM-generated story-document, one we are either reading or which is being "acted" at us by regular code. Our own instinct for pareidolia and simulating/inferring other minds is very strong, which means we should require really good evidence/logic to counter our instincts.

Even if one believes the LLM has a single "real mind" as an author of every document... what evidence do we have that it is conscious or "self-inserting" itself as one of the characters in the document?


>One of the elements I expect in a conscious being is that you can't rewrite it by changing the introductory paragraph.

If we had enough knowledge of the workings of the human brain, you could alter the perception of every single memory you've ever had. And limited versions of this already happen all the time. Human memory is notoriously unreliable for a reason.

Are you aware of the Recovered Memory Therapy Scandals of the 80s/90s ? Boy did that ruin a lot of lives. You can rewrite a human by changing their 'introductory paragraph'. It's just not as accessible.


> If we had enough knowledge of the workings of the human brain, you could alter the perception of every single memory you've ever had. And limited versions of this already happen all the time. Human memory is notoriously unreliable for a reason.

Knowing how something works is not the same as having the tools to change it.

Discovering memories are incorrect does not massively change who we are. As someone with a very defective memory, I discover on an hourly basis that I'm won't about something I thought was true, but there's still continuity and consistency to my personality and general approach to life.

...in fact, as someone who was raised an evangelical Christian and believed wholeheartedly without a shadow of doubt, then lost my faith entirely in my late thirties, I sort of did have my "introductory paragraph" changed, yet my wife, children, and friends would all say I'm still me, and that my core personality and nature remains largely the same.

> Are you aware of the Recovered Memory Therapy Scandals of the 80s/90s ? Boy did that ruin a lot of lives. You can rewrite a human by changing their 'introductory paragraph'. It's just not as accessible.

The recovered memory scandals are not even close to evidence that you can rewrite a human.

The people who thought they had learned new facts about themselves did not suddenly lose their context as humans in 20th century America.

They did not suddenly lose their sense of humor, or develop a previously-unseen penchant for murdering small children.

They experienced a revision of belief, and a pretty major one that really distressed them, but it did not change everything about them.

LLMs _do_ manifest wildly differently based on the first paragraph.


> I think there is a flaw in the logic of saying that human text have a pattern of "consciousness mechanism" and therefore LLM will learn "consciousness mechanism" in order to return sentence continuation that is convincing.

There is no independent "consciousness mechanism" that one might imagine humans have learned or evolved for its own sake. Evolution learns various solutions to optimization problems, and so if consciousness evolved then it was either useful instrumentally, or it is a byproduct of some organization that is useful instrumentally. The point is that as a solution to certain kinds of optimization problems, consciousness can conceivably be the solution to the optimization problem of predicting the next token of text written by humans who themselves have complex phenomenology. There is nothing that a priori constrains token prediction from the domain of consciousness.

>For me, one element that shows it is the case is the absence of world model (or "human-like" world model) despite the fact that the sentence continuation is convincing

World models don't have to be rich and detailed to count as a world model. Lower life forms might be conscious but they only model the part of the world useful for their existence in their ecological niche.


> The point is that as a solution to certain kinds of optimization problems, consciousness can conceivably be the solution to the optimization problem of predicting the next token of text written by humans who themselves have complex phenomenology.

Yes, I agree with that. Consciousness is a good way of generating convincing human text.

What I don't agree with is that consciousness is the only way to generate convincing human text and that because we have convincing human text, it can only imply we have consciousness.

There is a huge probability that generating convincing human text can be done without consciousness. Either because there are efficient mechanisms as efficient as the way the human brain deal with this problem and that the LLM found one of them (and these mechanism may be quite difficult to imagine for a human). Or even because the LLM found a local minimum and is stuck there.

To re-use the evolution approach: evolution solved the "flying problem" with bird feathers, but also with insect wings or bat wings. The fact that evolution ended up using feather does not imply that everything that flies can only fly with feathers.

> World models don't have to be rich and detailed to count as a world model

I agree in general, but here, we are talking about machine that reproduce all human language. The argument I'm answering to is pretending that "all of human knowledge" is understood, which include every single human concept. This has to be everything, because LLM is able to provide convincing text about every subject. If on some subject, the LLM is able to provide convincing text without "understanding" it, then the argument that it is impossible to provide convincing text without understanding it collapse.


> There is no independent "consciousness mechanism" that one might imagine humans have learned or evolved for its own sake.

> There is nothing that a priori constrains token prediction from the domain of consciousness.

We don’t know either of these are true or false though. We simply don’t know. There is no agreed upon definition of consciousness, aside from maybe _the having of qualia_, so arguing that some can or cannot be conscious a priori can’t be done.


>There is no agreed upon definition of consciousness

No one genuinely engaged with the topic is confused about the target of the term (phenomenal) consciousness. Definitions come once the theoretical work is complete, to be articulated as part of a fully worked out theory. The lack of a definition doesn't prevent us from investigating the subject or offering conjectures. What we can do is offer a precise description of the target and argue for or against whether LLMs reach the description. We will of course debate whether the offered description captures the relevant phenomena. But this is all just part of the process.


> There is no agreed upon definition of consciousness, aside from maybe _the having of qualia_

Try to define qualia though without explicitly or implicitly recursing into consciousness.

It's all a large house of cards that's built on handwaving and "I know it when I see it".


I think, for me, the thing is that when you tutor undergrads in abstract math, you discover that students will very often find data pattern that fit the goal but does not correspond to a real mathematical principle.

sometimes humans making claims about AI intelligence or consciousness also identify spurious patterns that do not correspond to the problems of intelligence or hard consciousness.


> students will very often find data pattern that fit the goal but does not correspond to a real mathematical principle.

That reminds me of a niche paper [0] critiquing a certain way of teaching remedial math that was over-focused on tests. A kid named Benny (12) was building up (wrong) "rules" for math which still somehow gave enough of an illusion of progress in terms of test scores that his misunderstandings hadn't been caught earlier.

> Benny was able to explain his procedure; e.g. for 5/10=1.5, he said: "The one stands for 10; the decimal; then there’s 5... shows how many ones." In another example, 400/400 = 8.00 because "The numbers are the same [number of digits]... say like 4000 over 5000. All you do is add them up; put the answer down; then put your decimal in the right place... in front of the [last] three numbers."

[0] https://people.wou.edu/~girodm/library/benny.pdf


Of course. But in my explanation "consciousness" or "understanding" is not "finding pattern", it is the pattern itself.

CNN are finding patterns, sometimes relevant, sometimes spurious, but I don't think people argue that CNN have evolved consciousness or understanding of what a cat or a dog is.

Here, the argument is "LLM are able to understand, because 'understanding' is the only pattern to reach the goal". I'm saying that it is unlikely to be the only pattern, and that it is likely that they find a local minimum on a system that reaches the goal that does not use 'understanding'.

The reason I'm saying it is likely is because "basic" LLM shows behaviours where they are producing convincing human text and yet doing things that are really difficult to reconciliate with the fact that they have understanding.

(And before that old argument is used, yes, I know sometimes some humans fail to understand. The problem is that the majority of humans don't fail to understand basic stuff in the majority of the time, while the "basic" LLMs do. The fact that you roll 10 dices 100 times and 1 of them never land on 1 does not convince me that that set of dice is loaded. The fact that you roll 10 dices 100 times and 9 of them never land on 1 does convince me that that set of dice is loaded.)


Not just undergrads. Even folks who believe in astrology or numerology depend on finding patterns in unrelated events to explain human behaviours.

The article talks about Ted Nelson's demo about hypertext. The first version of the demo was using Nabokov's Pale Fire book. This first version of the demo has been lost. The article is not saying that Nabokov's book has been lost, but that the usage of Nabokov's book as a demo for hypertext has been lost.

> the usage of Nabokov's book as a demo for hypertext

I get what you are saying, but should just point out that the Kindle version of the Penguin edition provides hypertext links from the poem to the deranged narrator's commentary. I remember reading a paper edition sometime back when, and being able to flip via hypertext is definitely superior to paper page flipping. And I'm someone that loves paper books.

This is a truly amazing and very, very funny book. If you haven't read it, you are really missing out.


Interesting, I have the exact opposite experience with flipping vs linking when it comes to books like _Pale Fire_. It's a lot more difficult for me to read the end notes on kindle, especially when it cross references more than one other end notes. Just couldn't keep my head straight as where I had been already. I had to buy a paper copy of _Pale Fire_ after fidgeting on my kindle (which I usually prefer) for a while, and I just kept two bookmarks (one in the poem section, one in the end notes section), and find other end notes ad hoc. The physicality of the pages helped me navigate back and forth.

I think hypertext is best for things like Pale Fire, where the linked text is long (it is a novel, after all), but I must admit that I like paper footnotes are good for things like the SF novels of Jack Vance, so you stay on (more or less) the same page, and you can ignore (or even re-imagine them) if you like.

It's one of my favorites. But I prefer to reread with two bookmarks, just as I did when I first encountered it (and just as I did with Infinite Jest years later).

Ah, my mistake.

Well, the agent should help you by saying "hey, I cannot do this task, but I can bypass the problem by doing this, but obviously it is not something you intended me to do or even something you were aware of, so I will not do it unless you tell me explicitly it's ok".

It's win-win: the agent is helping and it is educating you about things you obviously did not realise.


That works great if it's one agent, absolutely doesn't if you want to tackle something complex that warrants using ..say.. ten agents.

I can imagine a future where this technology empowers you to do things with a thousand agents.


You can have ten thousand agents, you will always have 1 agent in charge of, say, reading the file in a distant directory, and this agent (which will have minimal context) should be smart enough to realise that this action is unusual.

I'm not sure what is your point: are you saying that in a multi-agent workflow, you will have one agent per letter read on the file? I would assume that each agent as a specific unitary "task", instead of having each agent doing one cpu instruction each without any knowledge of the bigger picture. The point of multiagent is to parallelize tasks that can be parallelize, not removing the context, in which case you are wasting money using an agent.


Seems like another one of those "kill or be killed" worldviews that embraces the multipolar trap to such an extreme that even misaligned AI is seen as a win so long as it's better at circumventing its masters than some imagined rival AI (presumably in China).

No, you're missing the point.

The idea is not that you parallelize simple tasks. With a thousand agents, eventually, once we figure out how to orchestrate agents for real, you can tackle significantly more complex projects.

Here's a random example - writing an OS kernel from scratch, porting a good subset of Linux drivers automagically, developing a passable userspace, testing on ten VMs with different hardware configuration.

We can't do this yet, of course. But when we can, these thousand agents can't ask you every time something goes wrong. That just doesn't scale.

This 'getting stuck once every ten-fifteen minues' is very much the experience trying to develop complex software with codex or Claude code right now.


This does not make any sense at all.

If you create a file that you don't intend for the AI to see, the situation should be identical to if you deleted this file before running it.

You argument is: "if you delete this file, the AI will not be able to build the project". This is 100% incorrect: the project, by definition of the file's status, does not need the file. And by the nature of the file, if the project requires it to be done, there is a bigger problem.

I really don't get it, you are asking the agent to be stupid: intelligent humans are able to realise that such workarounds are often a stupid thing to do and know that it is smarter to discuss things when there are several stakeholders. I really don't understand why you are saying that ideally, agents should act stupidly.

(Not all workarounds are stupid, but some are, and the one in the example clearly is. We need agents to be smart enough to know when a workaround is ok or not. Right now, it is clearly not the case)

And by the way, as when working with human, nothing prevent you to tell the systems that reading any files is authorised. In which case there is no workaround at all if the agent read this file, as you authorised it to do so. But ideally, if it has not been authorised, we should build systems that know such workarounds are stupid things to do.

So, no, your argument that the agents will always get stuck is not true: human don't get stuck and yet the smart human knows that reading files clearly not intended for them to read even if they suspect it will unblock them is not "normal".


I don't think the discussion about "is it a distribution or not" is very interesting, but I think the discussion about "should we make clear that this _thing_ is just a bunch of config files rather than the usual work one would expect behind what was traditionally called linux distributions".

And I know Omarchy is not the only one out there doing something similar, and that there is a spectrum. It is not a problem. The problem is not the existence of the spectrum, the problem is that at some point, we should just call a cat a cat instead of arguing "well, being a cat is a spectrum, so every year we can call a new thing in the adjacent spectrum a cat and pretend it has all the quality one would expect from a cat".


It's a way to distribute Linux (the kernel) and userland packages in a way you can install. Therefore it's a Linux distribution.


As I've said, I think the discussion about the "real definition of a distribution" is not interesting. It's like idiots who think they are smart when they say "tomato are technically a fruit".

What I'm interested in, is obtaining a system that allows me to run Linux and userland reliably and trust-worthily. I don't know about you, but I don't compile all my software one by one myself after checking the source code. So, I prefer not relying on people who don't display much understanding on how to distribute a system reliably and trust-worthily. If someone is "just" making their own "flavor" of desktop and distribute it and call it "a distribution" without even noticing that such a package is lacking a lot of things that traditional distributions are doing, these people are just not mature enough to be trusted to do a good job. (and of course, being a "traditional distributions" is not "good enough", it is a necessary but not sufficient condition)

Don't get me wrong, they can distribute their flavor as much as they want, I'm happy with that. But if they act as if their stuff is the same as what is traditionally called "linux distribution" or if they are not smart (or honest) enough to mention that it's different, then 1) they are not mature enough, 2) it is worth informing newcomers or naive people about that.

It's a bit like a company that build cars, and then you have a guy that buy some of these cars, change one or two things on the dashboard and paint the car in a different color, and call themselves a car manufacturer. Nothing wrong with selling customised cars, but it is dangerous to act as if the guy is a proper manufacturer when they don't have either the capacity, the knowledge or the expertise to provide a good reliable car.


Did dhh provide a recipe to install hyprland properly without having to install a full "distribution"? (I don't know, it's a real question)

It feels very strange (and wrong) to me: if there is difficulties in installing something, try to help people instead of packaging the solution with other things that are not related. It feels a bit like if uv was mainly providing their "uvOS" to solve the difficulties of dealing with python packages.


>Did dhh provide a recipe to install hyprland properly without having to install a full "distribution"? (I don't know, it's a real question)

I would guess in typical DHH fashion he would say it is Open Source. And I don't understand where this just Arch + Hyprland installation is coming from?

They have also customised the OS / distro so it install in less than 2 min on a super fast USB. Getting Laptops, both Framework and Dell are now on board, tested on Omarchy so they work out of the box. And so many other tiny things that just make the experience better. I say better but to most consumer, those are expected in the first place.

And this "expectation" people have been waiting for more than a decade.


> Getting Laptops, both Framework and Dell are now on board, tested on Omarchy so they work out of the box. And so many other tiny things that just make the experience better. I say better but to most consumer, those are expected in the first place. And this "expectation" people have been waiting for more than a decade.

As a fan of boring Dell laptops/desktops and owner of many, I can tell you they have been well supported in every distro I have tried (Debian, Fedora, Arch, SUSE)


Dell has been selling machines with official Ubuntu support for ages iirc


If I remember it correctly Omarchy started as an in-house alternative to macOS in one of DHHs companies. And was then released to the public.

So the purpose of Omarchy was to get devices quickly set up with some opinionated defaults.


He built it for himself first, posting frequently about it on X. Once it reached a point of stability, he announced that Basecamp was starting to transition it's employees from macOS to it.


So, is the answer "no"?

I don't think it changes anything about what I was saying. If indeed dhh helped find a way to install hyprland more easily but failed to also provide a standalone recipe, that does not sound like a good practice to me.


The answer is: no, solving your problem was not the goal of the project.

But the source code is public, you can extract the relevant scripts from the repo: https://github.com/basecamp/omarchy


This is not what I'm saying. I'm not saying that they should "solve my problem", I'm saying that their reputation should be reviewed negatively if they "create a distribution to solve a problem that has no reason to be solve by creating a distribution". Not that it is a very very bad thing, just that it shows that they are not really good at what they do.


'I'm saying that their reputation should be reviewed negatively if they "create a distribution to solve a problem that has no reason to be solve by creating a distribution".'

Why? People can do as they wish and you can use it or not.


What? Why are you saying "why"?

I'm just saying that I trust people who know what they are doing, and if there is someone who does a "superficial" job* but present it as if it is the "whole deal", then they don't really understand what it takes to the whole deal and therefore they don't know what they are doing.

*: I don't mean "superficial" pejoratively, just that a "traditional" distribution does wayyyyyy more than what is done in Omarchy.

And, sure, they can do as they wish, and the consequence is that they get the reputation they deserve. You cannot say "sure, I poop in a bucket and pretend it is a good solution because my toilet is blocked, but people can do as they wish and you can visit my house or not", and I fully agree with that AND I will still say "the reputation of this guy should be reviewed negatively, as it is clear they have a low understanding of how to deal with basic plumbing". You cannot just answer me "What! How dare you to say this guy reputation should be reviewed negatively".


Its exactly what you’re saying. You have a different problem and a different opinion. And your conclusion is that „they are not good at what they are doing“

I’m really no DHH fan, but i think he knows what he’s doing and is also good at it.


I don't have any problem (I don't use hyprland).

The situation is simple, I'm just saying to people the following: Whatever you call what this thing is, it does not look like the people doing it have a strong grip of what is usually considered important in "traditional distribution". If you don't care about these aspects, great for you, go ahead. If you don't even notice that these aspects are a thing or that this distribution is different on this point, then maybe it is worth for me (and others) to bring that out. Maybe for these people it is useful (and maybe it is not useful for other, in which case, I hope they will just act like an adult and don't complain that someone mention something useful for people who are not them).

I was reacting to someone saying that "Omarchy solved my problem with hyprland when no one else was able to, so it is an indication on how good of a distribution it is". I think it is the point: a "linux distribution" is there to solve a totally different problem. If you have difficulty installing hyprland, the logical solution is to provide tools to help installing hyprland, tools that can work in any distribution. If you go into a strange solution instead (such as ending up building a brand new distribution around it and saying "it's open source, you can always extract the specific code if you don't want the distribution"), then it is just natural that people wonder if you are really understanding how it works.

As for DHH, I don't know: being a good developer is quite different from what it takes to build a reliable distribution, and it looks like he is very prone to think that because he is a good developer, he is good at everything. If anything, the fact that he has no grasp at all at what people talk about when they talk about these kind of thing, it makes me think he knows even less what he is doing.


I was just talking about my experience. I don't think DHH's entire goal was only to help people install Hyprland, it's weird that you're getting this idea.


It is not what I'm saying, of course.

I'm saying that if they ended up shipping the house because the house contains their new useful microwave but forget to ship the microwave independently, it is something that should decrease their reputation, it looks silly and amateurish.

Of course, I'm not saying that they should solve my problem for me. Simply, they are doing things in a complicated way either uselessly or either non-fully-honnestly.


That is exactly what you are saying.

"Oh there's a half furnished house. Silly amateur house builder, why they don't just sell microwaves?" ?!


What? Not at all what I'm saying. The whole thread started with "a solution to install hyprland", which is "the microwave". My expectation is that someone who knows how to fix a microwave will also know how to distribute it without the whole house.

If someone provide a half-furnished house, that is fine by me. If they provide a half-furnished house and also say "hey, it comes with a microwave because I know how to fix a microwave. If you want me to fix a microwave without having to have the whole house, do it yourself, I don't know how to do that", then it raises quite a bunch of red flags about how this person understand how a house works. And in this case, yes, I will call this person an amateur. Not because of the half-furnished house, but because they presented the situation in a way that indicate that they don't really have a grasp on how houses and microwaves work.


> If they provide a half-furnished house and also say "hey, it comes with a microwave because I know how to fix a microwave. If you want me to fix a microwave without having to have the whole house, do it yourself, I don't know how to do that",

You're still misattributing the reason I like Omarchy to DHH's reason of making Omarchy.


Incorrect.

I did not say "I want to sell half-furnished houses because I want to fix microwaves", I said "Hey, it comes with a microwave ...".

I'm not saying Omarchy is done by people who don't know what they are doing because they created a distribution to fix hyprland. I'm saying Omarchy is done by people who don't know what they are doing because despite having a fix to hyprland, they don't act with it like an adult would.

Again, I know perfectly that "fixing hyprland" was not the objective. But the way they are behaving is just smelling too much of people who cannot be trusted and don't really know how a traditional distribution works and what makes it special.


You're suggesting that because the system doesn't offer a standalone recipe to setup Hyprland, implying that everything else it does include and does better than anything else is not a standalone package also, it's silly and amateurish and they don't know what they are doing. You can try and convince me all you want, but that is not a point of view I could ever get behind, sorry.


I did not say "despite having a fix to hyprland, they don't offer a standalone recipe", I'm referring to the whole behavior: how it is presented, how it is hyped, how there is its own conference, its own merchandising, ... When it comes to hyprland, there is this childish attitude of "people do what they want, extract it from the source yourself" when people are legitimately surprised on how the thing was handled.

If it was presented as "Hey, we know it's just some scripts, we don't do the same kind of work that traditional distributions do. We still call it a distribution, but don't hesitate to support Arch instead who is doing the hard work for us", it would be different. But it is apparently not what is done (based on what I've read on the subject in the meanwhile).

What I'm saying is obviously not as simplistic as how you summarized it. It looks like you are just upset that someone may see bad signs in the way this distribution (or whatever one wants to call it) is handled. That's fine, I'm not forcing you to not use it, the same way I'm not forcing anyone to not use anything that is overhyped.


> I did not say "despite having a fix to hyprland, they don't offer a standalone recipe",

You said this:

> Did dhh provide a recipe to install hyprland properly without having to install a full "distribution"? (I don't know, it's a real question)

> It feels very strange (and wrong) to me: if there is difficulties in installing something, try to help people instead of packaging the solution with other things that are not related.

And this:

> If indeed dhh helped find a way to install hyprland more easily but failed to also provide a standalone recipe, that does not sound like a good practice to me.

I understand your overall point now that you took the time to explain a bit more, and it is valid criticism. But it is not "obvious" what you were saying. Based on the replies you've got, I see that I'm not alone to think this. You might want to look inward into why that is.

Not upset by the way, just responding in kind.


The number of reactions was pretty small for Hacker News, and the silent majority probably did not react because they did not have a biased reading of my comments and there is therefore nothing to react to.

I don't believe it was difficult to understand, and you are probably not a good person to estimate if it was the case or not (of course if something is misunderstood by 0.1% of the people, these persons who misunderstood it will say "well, it was difficult to understand", it would be more convincing if this judgement was coming from someone who did understand it from the start).

I also think that there is a difference in culture. If you have enough experience to notice that Omarchy is overhyped, then you probably also have some experience thinking of what makes a reliable distributions and so on, and where I was getting at probably seemed simple and obvious. Inversely, if it's not the case, what I'm saying are concepts you did not really think or care about, and therefore it may be new or confusing, even if it is correctly explained.


> How much of this is based on how expensive it is to bring a powerplant online? How much of that expense is based on endless lawsuits from environmental groups and weaponized environmental laws? Why can the navy without those restrictions build safe reactors for ~$2million/megawatt?

Pretending it's all the fault of the bad environmentalists is a bit ridiculous. A nuclear powerplant is a tricky thing to create. A lot of projects had delay, often not due to any environmentalists or anti-nuclear people, but because the parts failed their internal control, which demonstrates that it is tricky to build. A nuclear powerplant is a huge provider that cannot be turned online for usually ~10 years, so you can also understand the complexity and the uncertainty: we are not able to predict the price of electricity or what will the electricity grid will look like in 2-3 years, and yet they need to predict it for a given region in 10 years.

And some environmental laws are frivolous or turned out the be incorrect (the same way some people who at the time were against some environmental laws turned out to be incorrect years later), but some laws are just legitimate and it is simply not fair to pretend that the opinions of some people should just be discarded because you have a different opinion. I myself don't always agree with some law, sometimes anti-nuclear, sometimes pro-nuclear, but a given fraction of these laws will exist, it is just the reality. It's like saying "communism would work if it was not for people who don't like communism": people who don't like communism will always exist and if your model require a world where it is not the case to work, then your model is stupidly unrealistic.


> if your model require a world where it is not the case to work, then your model is stupidly unrealistic

And yet, our world contains multiple cases where it is the case that nuclear is being built today, at reasonable costs, and with great success. The two examples I've given in this thread are China and the US Navy. Some others include Japan and South Korea, both of which are notably not dictatorships.

What's frustrating in this discussion is policy and management decisions made 50 years ago are assumed to be the steady-state immutable reality in western countries.

My argument is not that nuclear is the best economic play. It's that if you believe that continuing to burn natural gas and coal is an existential risk, you should be spinning up every option all at once as aggressively as you can.



From the fine article:

>was originally slated for completion in 2020.

>But repeated delays pushed back full commercial operations until 2024, when the fourth and final unit came online. The setbacks drove up costs and eroded profitability.

What could have caused delays in 2019 ~ 2020 time frame?

It would be nice to see a postmortem.


What? Who is saying that nuclear cannot be successful, this has nothing to do with my comment. Did you read one sentence without understanding the meaning?

It is simple: some environmental laws are a legitimate ask from some people, whether you or I agree with the ask itself. It has nothing to do with the nuclear, it is about your argument framing the existence of environmental laws as the reason it does not work. If nuclear cannot work well in some countries because in some countries there are people who ask legitimate things, the problem is not these people, the problem is that the nuclear model is not adapted to the reality of these countries.

But again, as I've said, it is not even the case: the difficulties with nuclear are not limited to "some environmentalist".

> It's that if you believe that continuing to burn natural gas and coal is an existential risk, you should be spinning up every option all at once as aggressively as you can.

That does not make sense. If you want to write a software that does something, you don't just spinning up Linux, Windows, Mac, and start writing code in Java, C++, python, typescript, erlang, ... at the same time. What you do is: you write a decision matrix, score it, and _choose one strategy_.

In the context of the climate crisis, the strategy can mix different technologies ... or not. The fact that it does not does not mean that this particular strategy is worse than another. In particular, budgets are obviously limited, so spending X$ on project A may lead to a successful project A while spending X/2$ on project A and X/2$ project B may lead to both projects A and B failing. (and if you don't think it's true, just increase the number N of projects until X/N$ is ridiculously too small to do anything. According to your sentence, you said you should be spinning up every options all at once as aggressively as you can, so you cannot do only N-1 projects, you need to split your money amongst the N projects).

When it comes to climate change, I was 100% pro-nuclear 20 years ago. Now, in some countries, it is too often a money pit (not because of regulation or the bad environmentalists) that is wasting money that could have helped the climate. If you believe that continuing to burn natural gas and coal is an existential risk, you should spend your time, money and energy to real solutions instead of achieving nothing by trying to do everything all at once without a plan.


But you are doing the same as what you are complaining about.

Racism is a complex phenomenon not limited to the simplistic view "they don't like black people". This representation is doing a disservice when some truly racist people are then justifying their actions and beliefs by saying "I cannot be racist, I'm friend with the garbage man who is black: he is a good black man, is polite to me and stay at his place. So, if I'm not racist, what I'm doing is just legitimate".

In the context of Tulsa, it is difficult to believe that the frustration of racist people seeing black people more successful than them has not contributed to the situation. It seems very natural and logical (and that's even the core of "white supremacy": it clearly states that white people deserve a better position in the social hierarchy than black people: white supremacy framing is all about how some classes are reserved to white people and not black people), and if you are claiming that it is not the case, you are the one with the burden of the proof.

While you have a point on raising that racism should not be reduced to only a class issue, you should have raised that as a precision around the discussion instead of presenting it as if racism has absolutely nothing to do with class and class sentiment.

To take back your parallel, what you do can be seen as: "A person entered a bar and was raped" (what you say) vs "A woman entered a bar and was raped". While nobody here claims that men cannot be raped, there is social phenomenon that create a gender imbalance, and it is important to not reduce the situation to "it has nothing to do with gender and the social norms around it".

In the rest of your comment, you, yourself, are doing a lot of interpretations. The fact that someone noticed that a class factor may have had an impact does not mean that they or all readers will conclude that it is the only way racism can happen (that is a huge stretch: if they know what happened at Tulsa, they very probably know a lot of other cases where the "only due to class" theory does not hold up). Same for "victim blaming": the fact that they were successful were obviously not used to excuse the massacre or pretend that somehow it was the black people's fault, the context is clearly to condemn the white racist people (and the success of the black people seems to be presented as an obvious additional factor on the racists, as it is obviously unfair to pretend that some people don't have the right to be successful).

I think the first comment was not totally perfect and would have been 100% fine if they would have simply added "class was one of the factor". But I think your reaction has way more problems and does a bigger disservice by reducing racism to a framework that can easily be instrumentalised by real racist people.


It is not difficult to believe that the frustration of racist people seeing black people more successful contributed to it. In fact, it's the most obvious and straightforward explanation for it, given the fact that it's 1)1921, 4 or so decades before the Civil Rights act, and in freaking TULSA lmao


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: