I found the talk very interesting because it shows both the issues as well as po...

skybrian · on April 22, 2024

I think of AI as a “hint generator” that will give you some good guesses, but you still have to verify the guesses yourself. One thing it can help with is coming up with search terms that you might not have thought of.

sorokod · on April 22, 2024

What would be the equivalent of searching for quotes in your first (PNG) example?

Switching to text source, what would you do if say 30% of the quotes do not match with CTR-F?

fauigerzigerk · on April 22, 2024

>What would be the equivalent of searching for quotes in your first (PNG) example?

I don't have a general answer to that. It depends on the specifics of the application. In many cases the documents I'm interested in will have some overlap with structured data I have stored in a database. In the concrete example there could be a register of practicing physicians that could be used for cross referencing. But in other cases I think it's an unsolved problem that may never be solved completely.

>Switching to text source, what would you do if say 30% of the quotes do not match with CTR-F?

That's what I meant by swapping false positives for false negatives. You could simply throw out all the items for which you can't find the quote (which can obviously be done automatically). The remaining items are now "fact checked" to some degree. But the number of false negatives will probably have increased because not all the quotes without matches will be hallucinations.

Another approach would be to send the query separately to multiple different models or to ask one model to check another model's claims.

I think what works and what is good enough is highly application specific.

sorokod · on April 22, 2024

There are two issues to address

1. The price of validation.

2. The quality.

The baseline is to do the work yourself and compare - the equivalent of a "brute force" solution. This off course defeats the purpose of the entire exercise. You propose an approach to reduce the validation price by crafting the prompt in such a way that the validation can be parially automated. This may reduce the quality because of false negatives and what not.

The underlying assumption is that this process is cheaper then "brute force" and the quality is "good enough". It would be interesting to see a writeup of some specific examples.