it's not a solved problem but it's not impossible to keep it at bay either. I created this tool for my own project and it does a pretty darn good job at keeping the AI accountable, I have a harness that runs this in a loop and helps refactor as we go like humans do anyways:
this review was essentially pointless, they reviewed the card for a ton of workloads nobody in their right mind would pick it for, and left out the only use case where it makes sense. great job?
Jensen hallucinates more than any llm, he just speaks without thinking all that much about what he says and he generalizes a lot. Trying to hold him accountable to imprecisions and gross simplifications is just going to frustrate whoever tries without changing one bit of his behavior.
reply