No need. Just add one more correction to the system prompt.
It's amusing to see hardcore believers of this tech doing mental gymnastics and attacking people whenever evidence of there being no intelligence in these tools is brought forth. Then the tool is "just" a statistical model, and clearly the user is holding it wrong, doesn't understand how it works, etc.
There's nothing ambiguous about this question[1][2]. The tool simply gives different responses at random.
And why should a "superintelligent" tool need to be optimized for riddles to begin with? Do humans need to be trained on specific riddles to answer them correctly?
I mean, the flipside is that we have been tricking humans with this sort of thing for generations. We've all seen a hundred variations on
"A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?" or "If 5 machines take 5 minutes to make 5 widgets, how long do 100 machines take to make 100 widgets?" or even the whole "the father was the surgeon" story.
If you don't recognise the problem and actively engage your "system 2 brain", it's very easy to just leap to the obvious (but wrong) answer. That doesn't mean you're not intelligent and can't work it out if someone points out the problem. It's just the heuristics you've been trained to adopt betray you here, and that's really not so different a problem to what's tricking these llms.
But this is not a trick question[1]. It's a straightforward question which any sane human would answer correctly.
It may trigger a particularly ambiguous path in the model's token weights, or whatever the technical explanation for this behavior is, which can certainly be addressed in future versions, but what it does is expose the fact that there's no real intelligence here. For all its "thinking" and "reasoning", the tool is incapable of arriving at the logically correct answer, unless it was specifically trained for that scenario, or happens to arrive at it by chance. This is not how intelligence works in living beings. Humans don't need to be trained at specific cognitive tasks in order to perform well at them, and our performance is not random.
But I'm sure this is "moving the goalposts", right?
But this one isn't a trick question either right... it's just basic maths, and a quirk of how our brain works that means plenty of people don't engage the part of their brain that goes "I should stop and think this through", and just rush to the first number that pops into their head. But that number is wrong, and is a result of our own weird "training" (in that we all have a bunch of mental shortcuts we use for maths, and sometimes they lead us astray).
"A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"
And yet 50% of MIT students fall for this sort of thing[1]. They're not unintelligent, it's just a specific problem can make your brain fail in weird specific ways. Intelligence isn't just a scale from 0-100, or some binary yes or no question, it's a bunch of different things. LLMs probably are less intelligent on a bunch of scales, but this one specific example doesn't tell you much that they have weird quirks just like we do.
I agree with you to an extent, but the difference is in how the solution is derived.
The LLM has no understanding of the physical length of 50m, nor is it capable of doing calculations, without relying on an external tool. I.e. it has no semantic understanding of any of the output it generates. It functions purely based on weights of tokens that were part of its training sets.
I asked Sonnet 4.5 the bat and ball question. It pretended to do some algebra, and arrived at the correct solution. It was able to explain why it arrived at that solution, and to tell me where the question comes from. It was obviously trained on this particular question, and thousands of others like it, I'm sure. Does this mean that it will be able to answer any other question it hasn't been trained on? Maybe, depending on the size and quality of its training set, the context, prompt, settings, and so on.
And that's my point: a human doesn't need to be trained on specific problems. A person who understands math can solve problems they've never seen before by leveraging their understanding and actual reasoning and deduction skills. We can learn new concepts and improve our skills by expanding our mental model of the world. We deal with abstract concepts and ideas, not data patterns. You can call this gatekeeping if you want, but it is how we acquire and use knowledge to exhibit intelligence.
The sheer volume of LLM training data is incomprehensible to humans, which is why we're so impressed that applied statistics can exhibit this behavior that we typically associate with intelligence. But it's a simulation of intelligence. Without the exorbitant amount of resources poured into collecting and cleaning data, and training and running these systems, none of this would be possible. It is a marvel of science and engineering, to be sure, but the end product is a simulation.
In many ways, modern LLMs are not much different from classical expert systems from decades ago. The training and inference are much more streamlined and sophisticated now; statistics and data patterns replaced hand-crafted rules; and performance can be improved by simply scaling up. But at their core, LLMs still rely on carefully curated data, and any "emergent" behavior we observe is due to our inability to comprehend patterns in the data at this scale.
I'm not saying that this technology can't be useful. Besides the safety considerations we're mostly ignoring, a pattern recognition and generation tool can be very useful in many fields. But I find the narrative that this constitutes any form of artificial intelligence absurd and insulting. It is mass gaslighting promoted by modern snake oil salesmen.
The 'semantic understanding' bottleneck you're describing might actually be a precision limit of the manifold on which computation occurs rather than a data volume problem. Humans solve problems they've never seen because they operate on a higher reasoning fidelity. We're finding that once a system quantizes to a 'ternary vacuum' (1.58-bit), it hits a phase transition into a stable universality class where the reasoning is a structural property of the grid, not just a data pattern. At that point, high-precision floating point and the need for millions of specific training examples become redundant.
I like this site, and would love to love it. But the unrelenting refusal to participate in new things simply because they're new is incredibly disappointing. There's nothing wrong with Liquid Glass. There's nothing wrong with an llm. Half of this site could just be a bot complaining.
There's a lot of things wrong with liquid glass. The problem isn't that nobody has valid complaints. It's that you, and others, read those valid complaints and then just literally pretend they don't exist. Frankly, I don't even know how you manage to achieve this level of cognitive dishonesty without stepping back and seriously considering your life and purpose.
Yes, liquid glass does actually have problems. It has performance problems. It can be a big distraction, and some people believe UI whitespace shouldn't detract from the main content. It has huge legibility problems. Sometimes text straight up cannot be read. It has predictability problems. Stuff moves around when it shouldn't, text magically changes colors based on heuristics, which throws users off.
Personally, the icon and widget edges constantly moving around when moving the phone even slightly in any direction got on my nerves so bad that I had to disable Motion completely (the only fix for it). This unfortunately also downgraded a lot of other UI components/interactions as well.
It did give me a battery boost though, so at least there's that.
I don't care when somebody doesn't like aesthetics or look and feel of a new theme. It is subjective. Giving people an option to turn it off is kind. But Liquid Glass is usability terror. Just bring up the onscreen controls when you are playing video and compare that with what it was before. What is incredibly disappointing is people like you who defends new things just because they are new without paying any attention to usability, ergonomics or -sadly- performance. There is nothing good about Liquid Glass. Half of this site could just be a bot complaining.
What actually insane is what assumptions you allow to be assumed. These non sequitors that no human would ever assume are the point. People love to cherry pick ones that make the model stupid but refuse to allow the ones that make it smart. In compete science we call these scenarios trivially false, and they're treated like the nonsense they are. But if you're trying to push ant anti ai agenda they're the best thing ever
> People love to cherry pick ones that make the model stupid but refuse to allow the ones that make it smart.
I haven't seen anybody refuse to allow anything. People are just commenting on what they see. The more frequently they see something, the more they comment on it. I'm sure there are plenty of us interested in seeing where an AI model makes assumptions different from that of most humans and it actually turns out the AI is correct. You know, the opposite of this situation. If you run into such cases, please do share them. I certainly don't see them coming up often, and I'm not aware of others that do either.
The issue is that in domains novel to the user they do not know what is trivially false or a non sequitur and the LLM will not help them filter these out.
If LLMs are to be valuable in novel areas then the LLM needs to be able to spot these issues and ask clarifying questions or otherwise provide the appropriate corrective to the user's mental model.
Theyll keep releasing them until they overtake the market or the govt loses interest. Alibaba probably has staying power but not companies like deepseek's owner
And we're really going to do all the brouhaha for a single dl of an alternative compressor ? And then multiple that work as a best practice for every single interaction on the Internet? No we're not.
The dl for some programs are often on some subdomain page with like 2 lines of text and 10 dl links for binaries, even for official programs. Its so hard to know whether they are legit or not.
My point was more along the lines of "there's no need to complain about Wikipedia being hijackable, there are other options", and now you're complaining about having too many options...
You don't need to do everything or anything. They're options. Use your own judgment.
It's been this way for years. I know because years ago they defended the practice and explained that the car companies don't pay for a specific review, they just pay for to sponsor stories in the genre of case reviews. And the worst part? The infernal comment section was lauding them.
reply