Yeah, it almost feels like you are talking to somebody with OCD. The frustrating part is, output tokens are usually a lot more expensive than input tokens, so they are wasting energy and money :-). Also, the more they generate, the greater the chance it will create attention issues as the conversation progresses.
This is why I built my chat app to let me manipulate LLM responses. If I feel it is not worth knowing, I'll just erase parts of it to ensure the conversation doesn't get side tracked. Or I will go back to the original user message and modify it to say
### IMPORTANT
- Do not generate more code than required.
The nice thing about LLM conversations are, every time you chat, the LLM treats it as a first time conversation, so this trick will work if the model is smart enough.
This is why I built my chat app to let me manipulate LLM responses. If I feel it is not worth knowing, I'll just erase parts of it to ensure the conversation doesn't get side tracked. Or I will go back to the original user message and modify it to say
### IMPORTANT
- Do not generate more code than required.
The nice thing about LLM conversations are, every time you chat, the LLM treats it as a first time conversation, so this trick will work if the model is smart enough.