Hacker Newsnew | past | comments | ask | show | jobs | submit | lukebuehler's commentslogin

Agree. It's code all the way down. The key is to give agents a substrate where they can code up new capabilities and then compose them meaningfully and safely.

Larger composition, though, starts to run into typical software design problems, like dependency graphs, shared state, how to upgrade, etc.

I've been working on this front for over two years now too: https://github.com/smartcomputer-ai/agent-os/


> Larger composition, though, starts to run into typical software design problems

I've been seeing the same thing. Where agents are great solving the immediate task, but as changes compound they run into software & architectural design problems. I created https://github.com/andonimichael/arxitect to help at least have coding agents self reflect on their software design. But I really like your approach to self-modification and improving the agent itself instead of just teaching it another skill in it's context.


I’ve been working with Claude Code to create copies of itself using git worktrees, run an iteration and then update its instructions. It can reverse engineer every website I tested it on. I kept updating the instructions then started asking Claude to update itself. Then asked if could figure out who to iterate unsupervised. https://github.com/adam-s/intercept?tab=readme-ov-file#the-s...

So what are software packages now a days other than precomputed subsets of capabilities. Like a mesh that data gets pushed through to produce what? What are the optimal subset of prebuilt programs to accomplish any task?

Exactly. Tools like grep, ls are also in the same category. Even in algorithm, we have techniques like memoization and Dynamic Programming which allow us to speed up things. Why should LLM fill up its context by "manually" doing what wc or ls does for you deterministically ?

It's a tradeoff. Technically, you need very few programs, you can let an agent do everything and coordinate everything. But that is also inefficient, it's slow and uses a lot of tokens. So you allow the agent to build tools and coordinate those tools, just like we humans do. However, with agents, the threshold of pain is much higher, we can let agents do thing's "manually" where humans would build automations much sooner.

Oh wow, what do you think of karpathys autoresearch? Feels like this is just that? Gotta openclawify it?

Wall-mounted dashboards are a huge life-hack, especially if you have a family. We got a 37-inch touchscreen one, running DAKBoard.

We have several kids and have been organizing our daily todos and calendars on it for several years. We used to drop the ball quite a bit due to a hectic schedule and the dashboard has helped us tremendously. Since it is mounted in the kitchen, being able to pull up recipes is a plus.


> 37-inch touchscreen [..] in the kitchen

I think I need a bigger kitchen, haha.

That sounds really cool, though. I'm currently trying to "train" our kids to manage their own schedules, e.g. reminding me that they have somewhere to be instead of vice versa.

Maybe a wall-mounted solution would help put it front and center for them.


I think sandboxes are useful, but not sufficient. The whole agent runtime has to be designed to carefully manage I/O effects--and capability gate them. I'm working on this here [0]. There are some similarities to my project in what IronClaw is doing and many other sandboxes are doing, but i think we really gotta think bigger and broader to make this work.

[0] https://github.com/smartcomputer-ai/agent-os/


The spec is pretty good! Within a day, Codex has written a good chunk of the attractor stack for me: https://github.com/smartcomputer-ai/forge


I started a full implementation of the attractor spec here: https://github.com/smartcomputer-ai/forge


I’m building AgentOS [1], trying to experiment where agent substrates/sandboxes will head next. It's a deterministic, event-sourced runtime where an “agent world” is replayable from its log, heavy logic runs in sandboxed WASM modules, and every real-world side effect (HTTP, LLM calls, code compilations, etc.) is explicitly capability-gated and recorded as signed receipts. It ensures that upgrades and automations are auditable, reversible, and composable. The fun bit is a small typed control-plane intermediate representation (AIR) that lets the system treat its own schemas/modules/plans/policies as data and evolve via a governed loop (propose > shadow-run > approve > apply), kind of “Lisp machine vibes” but aimed at agents that need reliable self-modification rather than ambient scripts.

[1] https://github.com/smartcomputer-ai/agent-os


Ha. Nice name. Looks like we both are: https://github.com/saadnvd1/agent-os


We should talk! I have been doing pretty much the same, but have been leaning heavier on the context parsing and schema sharing between apps.


This is definitely interesting, seems like the sort of thing Nix would do well.


Thanks! NixOS is great at building and configuring systems, while AgentOS is about running and governing long-lived, deterministic agent worlds. They share ideas like immutability and declarative state, but they operate at different layers. I would say if NixOS is about reproducibly constructing a system, AgentOS is about reproducibly operating one: tracking decisions, effects, and evolution over time.


Excellent article, and I fully agree.

I came to the same realization a while ago and started building an agent runtime designed to ensure all (I/O) effects are capability bound and validated by policies, while also allowing the agent to modify itself.

https://github.com/smartcomputer-ai/agent-os/


Thanks! Just looked at Agent OS. Love the 'Signed Receipts' concept in your AIR spec.

We reached the same conclusion on the 'Ambient Authority' problem, but I attacked it from the other end of the stack.

Tenuo is just the authorization primitive (attenuating warrants + verification), not the full runtime. The idea is you plug it into whatever runtime you're already using (LangChain, LangGraph, your own).

I'm currently in stealth-ish/private alpha, but the architecture is designed to be 'userspace' agnostic. I’d love to see if Tenuo’s warrant logic could eventually serve as a primitive inside an Agent OS process.

I'll shoot you a note. I would love to swap notes on the 'Capabilities vs. Guardrails' implementation details.


Arguably this is already happening with much human-to-human interactions moving to private groups on Signal, WhatsApp, Telegram, etc.


high hanging fruit!


Eternal Vault is interesting. I would for sure use something like this. However, only if there is a strong story how the vault will survive 20+ years, even if your company is defunct. I do see the pieces scattered around the website (backup to Dropbox, etc), but this story needs to be front and center.


Hi Luke, thanks for the feedback. Will be working on improving the marketing site to share the story in better way, any other feedbacks are also appreciated. Lastly, would love for you to give the platform a try at https://dash.eternalvault.app/register


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: