So Google hasn't used an LLM to generate and test weird queries ? This is not pu...

nicklecompte · on May 25, 2024

Google’s poor testing is hardly in doubt. But keep in mind that the whole problem is that LLMs don’t handle “unlikely” text nearly as well as “likely” text. So the near-infinite space of goofy things to search on Google is basically like panning for gold in terms of AI errors (especially if they are using a cheap LLM).

And in particular LLMs are less likely to generate these goofy prompts because they wouldn’t be in the training data.

ADeerAppeared · on May 25, 2024

> So Google hasn't used an LLM to generate and test weird queries ?

You don't even need an LLM for that. Google will almost certainly have tested.

The test result is just politically-unacceptable within the company: It doesn't work, it's a architectural issue inherent to the technology, we can't fix it.

Instead, they just rush to patch any specific, individual errors that show up, and claim that these errors are "rare exceptions" or "never happened".

What's going on here is that Google (and most other AI firms) are just trying to gaslight the world about how error-prone AI is, because they're in too deep and can't accept the reality themselves.

kwertyoowiyop · on May 25, 2024

Deploy the cheap offshore labor!

cjk2 · on May 25, 2024

They already know it’s a shit show. They are trying to push it along until it’s someone else’s fault.

ADeerAppeared · on May 25, 2024

I'm not convinced the executive layer is aware how dire the problem is.

On one hand, their support for outsourcing programmes; "Training Indians on how to use AI", suggests they realize AI tooling without human cleanup is a crapshoot.

On the other hand, they keep digging. This kind of gaslighting is an old and proven trick for genuinely rare problems, but it doesn't work if your issues are fairly common, as they'll get replicated before you can get a fix out.

Similarly, they're gambling with immense legal risks and sacrificing core products for it. They're betting the farm on AI, it may kill the company.

cjk2 · on May 25, 2024

I think they are more than aware but will magically disappear after cashing their stock just about the point the bubble pops. Don't forget that the AI industry is almost 100% based on hype. Microsoft will be the largest victim here, their entire product portfolio being turned into a nuclear fallout zone almost overnight. Satya and friends are going to trash the whole org.

I regularly speak to laypeople who assume that it's some magical thing without limits that makes their lives better. They are also 100% unaware of any applications that will actually make their lives better. End game occurs when those two disconnected thoughts connect and they become disinterested. The power users and engineers who were on it a year ago are either burned out or finding the limitations a problem as well now. There is only magical thinking, lies and hope left.

Granted there are some viable applications but they are rather less overstated than anything we have no and there are even negative side effects of those (think image classification, which even if it works properly, requires human review and there are psychological and competence things problems around that too).

sebastiansm · on May 25, 2024

Google is working hard to be the next Boeing.

CaptainOfCoit · on May 25, 2024

> So Google hasn't used an LLM to generate and test weird queries ?

What about simple manual testing? Seems to have skipped QA completely, automated or not.

nicklecompte · on May 25, 2024

There has been a lot of excitement recently about how using lower precision floats only slightly degrades LLM performance. I am wondering if Google took those results at face value to offer a low-cost mass-use transformer LLM, but didn’t test it since according to the benchmarks (lol) the lower precision shouldn’t matter very much.

But there is a more general problem: Big Tech is high on their own supply when it comes to LLMs, and AI generally. Microsoft and Google didn’t fact-check their AI even in high-profile public demos; that strongly suggests they sincerely believed it could answer “simple” factual questions with high reliability. Another example: I don’t think Sundar Pichai was lying when he said Gemini taught itself Sanskrit, I think he was given bad info and didn’t question it because motivated reasoning gives him no incentive to be skeptical.

flyingspaceship · on May 25, 2024

Well yeah imagine how much money there is to make in information when you can cut literally everyone else involved out, take all of the information and sell it with ads and only give people a link at the bottom, if that is even needed at all

pilooch · on May 25, 2024

The adversarial surface to the LLM remains enormous, manual cannot handle it.

jameshart · on May 25, 2024

Asking how to prevent cheese from sliding off pizza is not an adversarial prompt.

_mlbt · on May 25, 2024

They still haven’t learned from the Gemini diverse Nazis debacle.