I know you were simplifying, but "does foo then bar" is so far away from what an actual specification is that it defeats the point.
A more complete spec will capture performance requirements, input preconditions and output postconditions, error handling and recovery behaviors, threading behaviours, hardware assumptions, etc. It's hard to do these things without leaning at least somewhat on the specific language runtime you are using, otherwise you'd end up regurgitating the C standard each time you design a software system.
It's this sort of stuff that is meant when people say "sufficiently detailed".
If you're actually testing all these things, then I might agree with you that you can do it in the tests, but almost no one actually is. I'd struggle to write a test suite that tests all the specification-level assumptions I draw from my language and target platforms.
If you're not already applying static analysis and linters to your codebase (and I know many of you aren't), ask yourself why you would bother to apply an expensive LLM tool?
Not to say these things won't catch vulnerabilities static tools cannot, I think they can, it's just we already have the capability to automatically catch a large surface area of common vulns, and have chosen not to, often for expense reasons.
If you're a team that does already apply several layers of analysis and linting, and wants to add this on top, all power to you.
If you run a static analysis tool across a repo that didn’t previously do that, you’ll see that while what you say might be true, there’s going to be an absolute treasure-trove of issues caught by the static analyser.
False positives are noise, but if the tool is filtering out its own noise via AI, it might work. Or you could take a high false positive/low false negative tool and instead of bothering humans with its noisy output, have AI investigate and evaluate if found issues are false positives or not.
> Ideally you use both. An AI model that has static analysis as part of the harness, so it can evaluate each potential finding.
Ideally the static analysis tools are improved so that we don't need to piss away yet more tokens like we're competing on Mark's leaderboard just to find vulnerabilities.
When you reach that ideal world, let me know. My company has thrown a decade+ and multiple teams at the idea you've described. We still aren't there yet.
Your proposal of relying purely on static analysis is over-idealistic and just not feasible for large, diverse codebases in the real world.
> Your proposal of relying purely on static analysis is over-idealistic and just not feasible for large, diverse codebases in the real world.
"Just not feasible" is thought terminating, but regardless, I thought we were talking about ideals? Ideally you want the static analysis to work, not to rely on the non-deterministic bullshitter.
You're so ideologically opposed to AI that you bury your head in the sand in cases where it genuinely does a fantastic job today, right now, in the real world (like developing end to end exploits using noisy signals like static analysis results, fuzzer results, etc).
Instead you assert that we should go a route no company has successfully proven out despite throwing billions of dollars and some of the best cybersecurity talent in the world at.
Anyways, if you develop a static analysis solution that works across large, diverse production codebases and develops end to end working exploits without AI, I will literally buy it off you for millions of dollars. Or you could start your own company. You'd be an overnight decabillionaire.
I actually do use AI, I wouldn't say I'm ideologically opposed lol. Maybe I'm ideologically opposed to thought terminating clichés, or how FAANGers see it as a cudgel to cram in wherever we find an open gap just to shit infinite tokens into?
You just haven't suggested a single solution that achieves the same level of risk reduction as AI driven end-to-end exploit generation.
You claim static analysis does the job, but you haven't backed it up with any proof that it works across large diverse codebases. Meanwhile, we have proof that AI works at least somewhat, here and now.
This is a distinction without a difference. We can conceptualize ourselves as fully deterministic individuals, or transcendentally connected nodes in a greater consciousness, or innumerable interpretations in-between. It's completely possible to be a deterministic materialist without being a nihilist.
In fact, I'd argue it's inevitable. A deterministic metaphysic dictates that you must come to the conclusion that it simply doesn't matter how you interpret things, and therefore you will eventually, accidentally, trivially choose to interpret yourself in a non-nihilistic way, thus breaking the trap and allowing yourself a compatible sense of self-determination, despite being capable of understanding the untruth of it.
I continue not to understand much of the point of this. I don't recognize the git workflow the author is talking about, and neither do I see the point of stacked PRs. Commits are fine as a unit of isolating work, and rebasing to keep that neat is not difficult.
How many PR's do y'all tend to have in flight at once? I sometimes think being a native (C++) developer makes me have a different take on some of this. Maybe if I was a JS dev making quick changes with 5 PR's a day I'd care more about this.
Very often when writing a new feature, I have to fix a bug in another subsystem (and write tests for it), add a small capability to another subsystem, fix documentation elsewhere, and so on.
Those could be separate commits in one PR, and that’s the way I’ve historically worked. But in jj it’s trivial to make them separate branches and continue working on the merge of all of them, even if they haven’t been merged upstream yet.
The benefit is that my coworkers have absolutely trivial PRs to review, rather than one omnibus. If there’s some debate about one of them, not only can we still make progress by merging the others, but I can continue plugging away on my main feature branch while we decide on the best path forward. If one of those PRs needed a bugfix or changes, fixing them typically m requires quite literally zero VCS shuffling to incorporate into the work downstream.
My pace of and ability to work on features is no longer bottlenecked by reviews. My coworkers no longer have to review giant PRs that touch seven subsystems. I don’t have to wait until an entire feature is finished before coworkers can start merging smaller preparatory bits for it.
You can do some of this stuff in git. But it’s excruciating and so nobody does it except in rare circumstances.
Each individual PR gets tested and merged the same way they would if you’d authored them one by one in the first place.
The combination is merged in your tree well in advance of a PR ever being made. When the ancestor PRs are merged, you just pull and your descendent merge commit is rebased automatically.
At no point are you pushing untested code that you wouldn’t have pushed in a similar git workflow.
I totally agree. I've never understood the push back against clean "railroad tracks" (i.e. rebasing instead of merge commits). It's simple, scales nicely and gives you a lot of options. Once you start allowing merge commits in the tree, things can get messy but with a bit of discipline, it elegantly solves every version control use-case I can think of, or have encountered, including at scale.
If you have a large PR with multiple nontrivial commits I think you should merge it so the intermediate states are the same ones that were originally tested. Otherwise you could break bisects, among other issues.
Stacked PRs is mostly a way to avoid thousand- or tens-of-thousands- LOC branches, which makes each PR meaningfully reviewable. You need a good code review process for it to become useful. It’s very pleasant once you are accustomed to it.
I’ve seen successful teams that regularly do reviews of massive PRs and feel this serves them well enough. I suspect it just places a lot more trust on the developers to get the details right so reviewers only look at larger design issues.
The language of choice is not relevant. Even before AI, one can accumulate thousands of lines of c++ easily.
I've also seen teams struggling to review massive PRs as a single diff, but when I've seen that it's always because they aren't structuring commits to be individually reviewable.
I work primarily in c++ and fully agree with the author. I think it makes more sense in the context of larger teams, possibly also monorepos. Review speed (both latency of feedback and how thorough your reviewers are), presubmit test runtime, flake rate, etc can lead to it taking a few days to get work submitted. In some cases you don't want to land work until you've finished a milestone of some sort. If others work on similar files to you, you will end up with several rounds of rebase conflicts to deal with too.
It's useful for low velocity teams that spend too much time in review.
For everyone else it's a net loss.
I've used Gerrit years ago, I thought it was great being able to shape my diffs into stacked chunks that reviewers could effortlessly navigate in a coherent story I'm telling.
Then I joined a company that wanted you to build and ship it in the same day. Now I dread the idea of going back to week long review cycles.
If you like a thing, it feels natural to tell others about it so that they can also benefit from it. There is no ulterior motive here. Sure everyone in the community will benefit if jj becomes more popular, but it's not like they are trying to sell you something for money. If you don't like it, that's fine, but if someone is facing similar problems to the OP then they will benefit. Almost everyone I've shown jj to at work has converted because it solves the same problems for them as it did for myself. Of course, not everyone has those problems.
I just want to point out that it is known that one of the biggest jj proponents on HN does have financial incentive to do so.
Steve Klabnik (the person that submitted this post) comments and posts about jj here often and works for
https://ersc.io (startup mentioned in the post).
So don't be so sure that all of the PR here comes from a pure selfless act. Some of them have income tied to the solution they are preaching.
This reads more like the liberal case for AI, and not the left wing case so much. The code switching section in particular is not a left-wing position at all, unless the idea is that the code-switching is used as a temporary measure to dismantle the idea of class itself.
This is exactly my problem with the piece. All of the proposed benefits are to do with how individuals can use AI to work the existing system for their own benefit, iff they have access to cutting-edge AI models provided by megacorporations. That's libertine, not left-wing.
The examples of benefits, I don't disagree with, and the author has chosen examples that do align with cultural socially left-wing concerns (disability, class). But saying "code switching to appear PMC works" is the problem, not the solution. If you don't think institutions can adapt to that reality to protect themselves from peasants wielding LLMs, I think your analysis of power is missing.
The examples in the piece are just more examples of how AI can be used to paper over societal issues instead of addressing any of the root causes.
(And I should say that on the flipside, AI is not the cause of existing social problems, but a symptom of them.)
I agree. I would almost say the best argument to make is "liberal humanism is compatible with machine intelligence and regulated capitalism, but we must still remain wary that authoritarians will abuse the system".
The only thing that has changed is that there used to be a loose correlation between capability to effect change and inherent desire for quality. This correlation barely exists anymore, so the counter-cultural acts that happened to manifest quality inside our perverse systems will occur much more rarely now.
A more complete spec will capture performance requirements, input preconditions and output postconditions, error handling and recovery behaviors, threading behaviours, hardware assumptions, etc. It's hard to do these things without leaning at least somewhat on the specific language runtime you are using, otherwise you'd end up regurgitating the C standard each time you design a software system.
It's this sort of stuff that is meant when people say "sufficiently detailed".
If you're actually testing all these things, then I might agree with you that you can do it in the tests, but almost no one actually is. I'd struggle to write a test suite that tests all the specification-level assumptions I draw from my language and target platforms.
reply