Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> the mere suggestion that hallucinations can be fixed by tweaking some variable or fixing some bug

That "suggestion" is fictional: they haven't suggested this. What they offer is a way to measure the confidence a particular model might have in the product of the model. Further, they point out that there is no universal function to obtain this metric: different models encode it in differently.

Not exactly a "cures cancer" level claim.



If you could measure hallucinations, then you would include that measure as an eval parameter of the search function.

But even if you could "detect it" but not cure it, it's still a braindead take. Sorry


This method uses "critical tokens", which I don't think you can detect until after you've generated an entire response. Using them as part of the search function seems infeasible for long outputs, but technically possible. Using it on single token outputs seems imminently feasible and like a cool research direction.

I think the paper itself demonstrates that the model has something internally going on which is statistically related to whether it's answer is correct on a given benchmark. Obviously, the LLM will not always be perfectly accurate about these things. However, let's say you are using an LLM to summarize sources. There's no real software system right now that signals whether or not the summary is correct. You could use this technique to train probes to find if a human would agree that the summary is correct, and then flag outputs where the probes say the output wouldn't agree with a human for human review. This is a lot less expensive of a way to detect issues with your LLM than just asking a human to review every single output.

While we don't have great methods for "curing it", we do have some. As I mentioned in a sibling post, contextual calibration and adding/adjusting training data are both options. If you figure out the bug was due to RAG doing something weird, you could adjust your RAG sources/chunking. Regardless, you can't put any human thought into curing bugs that you haven't detected.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: