Design indeed becomes the bottleneck, I think that this points to a step that is implied but still worth naming explicitly -> design isn't just planning upfront. It is a loop where you see output, see if it is directionally right, and refine.
While the agents can generate, they can't exercise that judgement, they can't see nuances and they can't really walk their actions back in a "that's not quite what I meant" sense.
Exercising judgement is where design actually happens, it is iterative, in response to something concrete. The bottleneck isn't just thinking ahead, it's the judgment call when you see the result, its the walking back, as well as thinking forward.
If you have a few minutes I invite you to check what we're doing over at Open Horizon Labs, its exactly the type of thinking we have around the current state of the world. Apologies I feel like I'm stalking you in the comments, but what you're saying absolutely resonates with what I've been thinking, and what I've been trying to build, and its refreshing to finally feel that I'm not insane.
https://github.com/open-horizon-labs/superego is probably the most useful tool we have, but I'm hoping that we can package it and bring it to the people, as it does make all these LLMs orders of magnitude more useful
No apologies needed—I'm just glad to find I'm not the only 'insane' person here. It's easy to feel that way when obsessing over these problems, so knowing my ideas resonate with what you're building at superego is a huge relief.
I’m diving into your repo now. Please keep me posted on your progress or any new thoughts—I'd love to hear them.
As for "proving it statistically"—you're looking for utility, but I'm defining legitimacy. A constitution isn't a tool designed to statistically improve a metric; it is a framework to ensure that the system remains aligned with human agency. I am not building an LLM optimization plugin; I am building a benchmark for human-AI co-evolution
I am 100% in agreement, AI is a tool and it does not rob us of our core facilities , if anything it enhances them 100x if used "correctly", ie intentionally and with judgement.
I will borrow your argument for JTP since it deals with exactly the kind of superficial objections I'm used to seeing everywhere these days, and that don't move the discussion in any meaningful way.
I’m thrilled to hear the JTP framework resonates with you. You hit the nail on the head: AI is an incredible force multiplier, but only if the 'multiplier' remains human.
Please, by all means, use the JTP argument. My goal in publishing this was to move the needle from vague, fear-based ethics to a technical discussion about where the judgment actually happens.
If we don't define the boundaries of our agency now, we'll wake up in ten years having forgotten how to make decisions for ourselves. I’d love to see how you apply these principles in your own field. Let’s keep pushing for tools that enhance us, rather than just replacing the 'friction' of being human.
While I still to figure out who watches the watchers, they're are pretty reliable given the constrained mandate they have, and the base model actually (usually) pays attention to the feedback.
Thanks! I'm glad you feel the same. Unfortunately, the thread was just flagged, so I've messaged the mods to appeal it. I hope it gets restored so we can continue the debate. Let’s see what happens!
LLMs find the center of the distribution: the typical pattern, the median opinion. Tailwind was an edge bet. It required metis, the tacit competence to know the consensus (semantic classes, separation of concerns, the cascade) was a local maximum worth escaping. That judgment, knowing what the center is wrong about, doesn't emerge from interpolation. It emerges from the recognition loop where you try something, feel "that's not quite it," and refine.
The bottleneck was never typing. It was judgment. Tailwind is crystallized judgment. AI can consume it endlessly. Producing the next version requires the loop that creates metis, and that loop isn't in the training data.
Bot intercepts Claude's AskUserQuestion calls via a hook, sends me an inline keyboard, injects my answer back into the session. Claude keeps working, PR still happens—but I can unblock it from my phone in 5 seconds instead of rejecting a PR based on a wrong guess.
Steve Yegge is building awesome things in this space, but I've found them too heavy, started using bd when it was small, but now its trying to do too much IMO, so made a clone, tailored to my use case -> https://github.com/cloud-atlas-ai/ba
durch - just starred this repo! Looking forward to testing it out as I learn how to build with multiple agents.
I'm just starting out with building with Claude - after a friend made this post he sent me a Steve Yegge interview (https://m.youtube.com/watch?v=zuJyJP517Uw). Absolutely loved it. I come from an electrical/nuclear engineering background - Yegge reminds me of the cool senior engineer who's young at heart and open to change.
The gap I wanted to fill: when Claude is genuinely uncertain ("JWT or sessions?" "Breaking change or not?"), it either guesses wrong or punts to the PR description where you can't easily respond.
Built a Telegram bot that intercepts Claude's AskUserQuestion calls via a hook, sends me an inline keyboard, injects my answer back into the session. Claude keeps working, PR still happens—but I can unblock it from my phone in 5 seconds instead of rejecting a PR based on a wrong guess.
Works in tandem with a bunch of other LLM enhancers I've built, they're linked in the README or that repo
Cassie.fm the newest and jankiest entry into the crowded uptime monitoring space now offers status pages and a public status API in its already generous free tier.
I built Cassie.fm (https://cassie.fm) website and API monitoring service.
Why?
I run a tiny software shop, and bulk of our work consists of integrating our products (field service management app) on top of various ERPs (local and ancient for the most part). These ERPs often have APIs tacked on as a afterthought. These go down very often . If the API we integrate with goes down, it looks bad for us since our product does not work, so in order to drive accountability and transparency I hacked together an API monitoring solution. I wanted something that:
- Scales as we add more monitors: Pay only for what you use. No subscriptions. (We're actually paying for our own credits, its a bit weird with accounting but it forced me to do a proper Stripe integration)
- Real-time alerts, sms, email and webhooks, so that we can keep everyone in the loop including our user facing apps.
- Simple, transparent pricing, maybe we decide to invest more heavily in marketing this to our existing clients, and most pricing these days is garbage
What’s Next?
I'm building this to solve a specific business problem for my shop. I like to be driven by actual users, so as adoption grows so will the feature set. Some things I'm considering are
analytics, and customisable alerts, response structure checks and schema validation.
Get Started
Register - https://cassie.fm/account/register. You get free 1600 daily credits, no strings attached. Accounts that have no monitors are automatically purged after 30 days, and I promise I won't send you any emails unless you send some first :).
While the agents can generate, they can't exercise that judgement, they can't see nuances and they can't really walk their actions back in a "that's not quite what I meant" sense.
Exercising judgement is where design actually happens, it is iterative, in response to something concrete. The bottleneck isn't just thinking ahead, it's the judgment call when you see the result, its the walking back, as well as thinking forward.
reply