Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmm.

None of what you said, however, precludes the necessity of "fine-tuning" as you call it. You just seem to want the models "fine-tuned" to your tastes. So you don't want unaligned models, you want models aligned with you.

I think most experts are wary of unaligned models. They don't want a disgruntled employee of some small town factory asking models how to make a bomb. That sort of thing. (Or to be more precise, we don't want models answering such questions.)

Most of us don't care whether models are aligned to your tastes, or your neighbor's. So long as they are aligned with the security interests of our local communities, the larger republic, and our general global community.



> So long as they ares aligned so as not to be deleterious to the security of our communities.

Whose community, though?

> They don't want a disgruntled employee of some small town factory asking models how to make a bomb

A lesson from the Unabomber (and many other incidents) that I think people have overlooked is the "_how_ to commit terrorism" is only one part, and the "_why_ commit terrorism" is another. An AI which tells you that the factory is controlled by paedophile terrorists and that you have a moral duty to act against them, but refuses to tell you how, is just as dangerous as one that tells you how to build a bomb without asking why.


Those two points do not seem equal weight in risk, but they are both concerning.


Personally it seems that bomb-building and other terrorist information is already fairly well available, as well as the US being awash with guns, and it's the ideological motivation that's the limiting factor for why we don't see much more terrorism.


(Note that I'm not attempting to contradict you in any way—your comment merely raised this issue, which I think is worth commenting on explicitly.)

One thing that various industries are currently coming to terms with is the fact that it is effectively impossible to create "neutrality".

Every one of your choices of what to include in the training data, what to explicitly exclude, what to try to emphasize, and any other ways you prune or shape a model, make it conform to one set of biases or another.

"Unaligned," here, at least as far as I can tell, is just a shorthand for "a model no one explicitly went in after training and pushed to do one thing or not do another." It doesn't mean that the model is unbiased...because even if your model contains absolutely everything in human knowledge, with no aspect being disproportionate to reality, real humans are also biased, and that "model replicating reality" is just going to replicate those real biases too.

It's always going to be more effective to acknowledge our own biases, both to ourselves and our audiences (whatever those may look like), and when we do try to shape something like a model, simply be honest about what that shaping looks like.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: