Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Using user generated data to train an AI is no different than scanning it for spam or any other administrative function, and using public data to train your AI model is fair use and everyone should get over it already.


>Using user generated data to train an AI is no different than scanning it for spam

That's definitely not true.

Under some circumstances LLMs can spit out large chunks of the original content verbatim. Meaning this can actively leak the contents of a confidential discussion out into a completely different context, a risk that does not exist with spam scanning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: