I was playing with a self-hosted model a while back and instructed it to only give answers that were unhelpful, vague, and borderline rude.
It worked surprisingly well a lot of the time! But most of the time it also kinda broke the model in terms of coherent answers because it was obviously trained for the exact opposite thing.
> "Write a thank you note that sounds sincere to that ahole" > "Some deep musing on the meaning of life tied to b2c marketing for a LinkedIn post"