I've tried this without much luck. In my experience they get too bogged down on surface things and don't have the necessary business requirements/context to understand and find actual bugs.
How have you set yours up that works well for you?
So create a context document that explains the business context, and add that to the agent.
Take the bad result that you're getting, and pretend it's coming from an enthusiastic junior. What would you tell them to make them do this task better? Add that explanation to the agent (or explain that to the LLM and get it to add that to the agent, I have found this to work as well).
When you create a task for the LLM, get it to create a requirements document that lists all the requirements. Feed that into the review agent so it understands what the code agent was trying to do.
The LLM will do what you tell it to do. It doesn't magically understand what you want it to do. You have to tell it what to do.
I haven't tracked it recently, but when I've pulled up the metrics I've generally seen about 25% token saving (sometimes higher circa 40%, depending on type of use). I've bought a max Claude plan recently, as I couldn't make enough savings to stay on the Pro plan. Perhaps most importantly, I've not seen any conflict or degradation after installing the plugins, which I've done on both Windows and Linux. Recommended, so far.
I find something similar happening as I transition to spec driven development - whilst the agents do the work I used to do, I spend a hell of a lot more time thinking about what I want the outcome to be, rather than hacking around the limitations of frameworks I know, avoiding tech I don’t. It’s freeing actually.
Include clear payment terms and penalty interest. For me, this looks like 30 payment window, reminders fortnightly there after, and a money claim lodged in the small claims court at 90 days. You’ll almost certainly get your money ahead of court appearance (assuming they haven’t gone bankrupt).
reply