Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Small flaw, where does the “naughty” start… we aren’t really doing ourselves any favours with an “I’ll know it when I see it” approach to restricted free form generative models that can be steered by users.

The mistake I see in a lot of arguments getting made online about all this is treating the AI/ML models as “whole systems” … we shouldn’t be trying to make very hard to analyse “black box” heart of the model, the bit where we’re turning statistical weights into data… this should not be assumed to do “everything”. If you want a naughty word filter, use a rejection filter from a dictionary, or bolt on a second AI/ML system tuned to catch stuff too similar to what you don’t want… but assuming all functionality can be somehow shoved into the one model is just foolish. We should have some unfettered models out there for us to use as either red-team tools to help generate content we can test safety and sanitising systems on, or otherwise use as desired…



Well, it was just my train of thoughts released into the wild. I have no specialized AI/ML background, so I cannot weight anything I write. I'm just making educated guesses.

There are some rules in every society that at least deal with the question about what is appropriate and when, so these rules should serve as a boundary. But I agree, there will always be an area of uncertainty. Maybe we should just stop to think about how to eliminate these. Also, these rules differ from society to society. Maybe the greater mistake is to assume that there is a generative model that fits the needs of every society.


The difference between societal standards and expectations is one of the reasons I think the efforts to build such content filtering/restrictions into the models themselves is a bad idea. Between uncertainty and the variation around the world we would be much better off if we embraced modular pipelines and tooling to make them easier, plug my American culture value filter in after my global human text generation tool, and serve that up to my American customers, plug my French culture value filter into the global model and serve it up to my French customers… much better than pretending there are global standards of morality for written content … there are obviously some standards people agree on in the real world of physical interaction between humans and other humans, animals, private property and so on … but what constitutes offensive text is nowhere near as universal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: