Some teams are using Claude or similar models in GitHub Actions, which automatically review PRs. The rules are basically natural language encoded in a YAML file that's committed in the codebase. Pretty lightweight to get started.
Other teams upgrade to dedicated tools like cubic. We have a feature where you can encode your rules either in our UI, or we're releasing a feature where you can write them directly in your codebase. We'll check them on every PR and leave comments when something violates a constraint.
The in-codebase approach is nice because the rules live next to the code they're protecting, so they evolve naturally as your system changes.
The "in-codebase" approach is the right one, but a YAML file with plain text is a half-measure. The most reliable rule that "lives next to the code" is an architectural test. An ArchUnit test verifying that "all routes in /billing/* call requireAuth" is also code, it's versioned with the project, and it breaks the build deterministically
That is a more robust engineering solution, unlike semantic text interpretation, which can fail
One thing that actually works is getting AI to review the basic stuff first so you can focus on architecture and design decisions. The irony of using AI to review AI-generated code isn't lost on me, but it does help.
That said, even with automated review, a 9000 line PR is still a hard reject. The real issue is that the submitter probably doesn't understand the code either. Ask them to walk you through it or break it down into smaller pieces. If they can't, that tells you everything.
The asymmetry is brutal though. Takes an hour to generate 9000 lines, takes days to review it properly. We need better tooling to handle this imbalance.
(Biased take: I'm building cubic.dev to help with this exact problem. Teams like n8n and Resend use it to catch issues automatically so reviewers can focus on what matters. But the human review is still essential.)
That's a really fair point. Architecture-first is definitely the ideal, and teams that can invest that time upfront tend to avoid a lot of downstream pain.
n8n is a good example of a tool that Airweave can enhance. n8n allows (no-code) developers to set up pre-determined automations but as soon as you want to process non-deterministic text into action on an app, you will still need a way to search the app. Example: you have a n8n workflow that gets you on track with Linear tickets. You hook it into a text-based human interface in which the user says: "I just created a task about database migration on Linear, can you start doing the preparations for it?". Airweave can 1. find that damn ticket, 2. give additional context on database migrations based on what else it finds in the integrated systems.
When it comes to SQL writing we are more relevant, when it comes to speed this is hard to benchmark exactly against Cursor and Windsurf but we are a bit slower (around ~600ms on average) obviously and we know what we have to improve to speed it up.
Next in the list is the next edit suggestion dedicated to data work, especially with dbt (or SQL transformations) where when you change a query you have to change the downstream queries directly.