The few times I've used LLMs as question answering engines for anything moderately technical, they've given subtly-but-in-important-ways incorrect information such that taking them at face value would've likely lost me hours or days of pursuing something unworkable, even when I ask for references. Whether or not the "references" actually contain the information I'm asking for or merely something tangentially related has been rather hit or miss too.
The one thing they've consistently nailed has been tip-of-my-tongue style "reverse search" where I can describe a concept in sufficient detail that they can tell me the search term to look it up with.
Absolutely. And I’m finding the same with “agent” coding tools. With the ever increasing hype around Cursor I tried to give it a go this week. The first 5 minutes were impressive, when I sent a small trial ballon for a simple change.
But when asking for a full feature, I lost a full day trying to get it to stop chasing its tail. I’m still in the “pro” free trial period so it was using a frontier model.
This was for a Phoenix / Elixir project; which I realize is not as robustly in the training data as other languages and frameworks, but it was supposedly consuming the documentation, other reference code I’d linked in, and I’d connected the Tidewave MCP.
Regardless, in the morning with fresh eyes and a fresh cup of coffee, I reverted all the cursor changes and implemented the code myself in a couple hours.
Yes, you have to be very careful when querying LLM's, you have to assume that they are giving you sort of the average answer to a question. I find them very good at sort of telling me how people commonly solve a problem. I'm lucky, in that the space I've been working has had a lot of good forums training data, and the average solution tends to be on the more correct side. But you still have to validate nearly everything it tells you. It's also funny to watch the tokenization "fails". When you ask about things like register names, and you can see it choose nonexisting tokens. Atmel libraries have a lot of things like this in them
And the output will be almost correct code, but instead of an answer being:
PORT_PA17A_EIC_EXTINT1
you'll get:
PORT_PA17A_EIC_EXTINT_NUM
and you can tell that it diverged trying to use similar tokens, and since _ follows EXTINT sometimes, it's a "valid" token to try, and now that it's EXTINT_ now NUM is the most likely thing to follow.
That said, it's massively sped up the project I'm working on, especially since Microchip effectively shut down the forums that chatgpt was trained on.
>The one thing they've consistently nailed has been tip-of-my-tongue style "reverse search" where I can describe a concept in sufficient detail that they can tell me the search term to look it up with.
This is basically the only thing I use it for. It's great at it, especially given that Google is so terrible these days that a search describing what you're trying to recall gets nothing. Especially if it involves a phrase heavily associated with other things.
For example "What episode of <X show> did <Y thing> happen?" In the past, Google would usually pull it up (often from reddit discussion), but now it just shows me tons of generic results about the show.
This. I was skeptical at first, but it is indeed good at searching and answering questions without! That said, I still have to double-check results for niche queries or about stuff that is relatively new. Sometimes, the "sources" for the answers are just someone's opinions — unsubstantiated by any facts — on an old Reddit post that's only tangentially related to the topic. And sometimes, you simply know that manual search and digging through SO answers yourself will yield better results. At this point I've developed a gut feeling that helps me decide whether to prompt Perplexity or just g**gle it.
> the "sources" for the answers are just someone's opinions — unsubstantiated by any facts
Isn't that the nature of the web?
I mean, that's exactly what I expect from web searches, so as long as you don't consider the fancy-looking [1] citations "scientific", it just digs up the same information, but summarized.
I don't expect miracles: Perplexity does the same things I've been doing, just faster. It's like a bicycle I guess.
I agree. Use with caution. One of my personal pet peeves with LLM answers is their propensity to give authoritative or definite answers, when in fact they are best guesses, and sometimes pure fantasy.
The one thing they've consistently nailed has been tip-of-my-tongue style "reverse search" where I can describe a concept in sufficient detail that they can tell me the search term to look it up with.