That is a good approach, bottom up, manage complexity. But the general picture is - you set the direction and hold the model responsible, it does the actual work. Think of it as your work is the negative of the AI work, it writes the code, you ensure it tests that code. The better test harness you create, the better the AI works. The real task is to constrain the AI into a narrow channel of valid work.