Amazing, some people are so enamored with LLMs who use them for soft outcomes, a...

derbOac · 2025-11-14T23:39:23 1763163563

Something that struck me when I was looking at the clocks is that we know what a clock is supposed to look and act like.

What about when we don't know what it's supposed to look like?

Lately I've been wrestling with the fact that unlike, say, a generalized linear model fit to data with some inferential theory, we don't have a theory or model for the uncertainty about LLM products. We recognize when it's off about things we know are off, but don't have a way to estimate when it's off other than to check it against reality, which is probably the exception to how it's used rather than the rule.

ehnto · 2025-11-15T05:32:19 1763184739

I need to be delicate with wording here, but this is why it's a worry that all the least intelligent people you know could be using AI.

It's why non-coders think it's doing an amazing job at software.

But it's worryingly why using it for research, where you necessarily don't know what you don't know, is going to trip up even smarter people.

lobsterthief · 2025-11-16T11:39:48 1763293188

You are describing exactly the Dunning-Kruger Effect[0] in action. I’ve worked with some very bright yet less technical people who think the output is some sort of magic lamp and vastly overindex on it. It’s very hard as an engineer to explain this to them.

[0] https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

munro · 2025-11-21T04:50:39 1763700639

I built an ML classifier for product categories way back, as I added more classes/product types, individual class PR metrics improved--I kept adding more and more until I ended up with ~2,000 classes.

My intuition is at the start when I was like "choose one of these 10 or unknown", that unknown left a big gray area, so as I added more classes the model could say "I know it's not X, because it's more similar to Y"

I feel like in this case though, the broken clocks are broken because they don't serve the purpose of visually transmitting information, they do look like clocks tho. I'm sure if you fed the output back into the LLM and ask what time it is it would say IDK, or more likely make something up and be wrong. (at least the egregious ones where the hands are flying everywhere)

worldsayshi · 2025-11-14T21:07:50 1763154470

Yeah it seems crazy to use LLM on any task where the output can't be easily verified.

palmotea · 2025-11-14T23:12:20 1763161940

> Yeah it seems crazy to use LLM on any task where the output can't be easily verified.

I disagree, those tasks are perfect for LLMs, since a bug you can't verify isn't a problem when vibecoding.

mopsi · 2025-11-14T22:28:44 1763159324

  > "Hey this test is failing", LLM deletes test, "FIXED!"

A nice continuation of the tradition of folk stories about supernatural entities like teapots or lamps that grant wishes and take them literally. "And that's why, kids, you should always review your AI-assisted commits."

markatkinson · 2025-11-15T09:08:39 1763197719

To be fair I'd probably also delete the test.