Here’s a copy of a post I made on Farcaster where I’m unconvinced it’s actually being used at all:
I've used OpenClaw for 2 full days and 3 evenings now. I simply don't believe people are using this for anything majorly productive.
I really, really want to like it. I see glimpses of the future in it. I generally try to be a positive guy. But after spending $200 on Claude Max, running with Opus 4.5 most of the time, I'm just so irritated and agitated... IT'S JUST SO BAD IN SO MANY WAYS.
1. It goes off on these huge 10min tangents that are the equivalent of climbing out of your window and flying around the world just to get out of your bed. The /abort command works maybe 1 time out of 100, so I end up having to REBOOT THE SERVER so as not to waste tokens!
2. No matter how many times I tell it not to do things with side effects without checking in with me first, it insists on doing bizarre things like trying to sign up for new accounts people when it hits an inconvenient snag with the account we're using, or it tried emailing and chatting to support agents because it can't figure out something it could easily have asked ME for help with, etc.
3. Which reminds me that its memory is awful. I have to remind it to remind itself. It doesn't understand what it's doing half the time (e.g. it forgets the password it generated for something). It forgets things regularly; this could be because I keep having to reboot the server.
4. It forgets critical things after compaction because the algorithm is awful. There I am, typing away, and suddenly it's like the Men in Black paid a visit and the last 30min didn't happen. Surely just throwing away the oldest 75% of tokens would be more effective than whatever it's doing? Because it completely loses track of what we're doing and what I asked it NOT to do, I end up with problem (1) again.
5. When it does remember things, it spreads those memories all over the place in different locations and forgets to keep them consistent. So after a reboot it gets confused about what is the truth.
i've never had situations where i prompt and had to go out for coffee or a walk or drive. one shotting - your first prompt. perhaps.
but like a person - when the possibility of going off in the wron g direction is so high, i've always had 1 - 2 line prompts, small iterations much more appealing. The only times i've had to rollback would be when i run out of credits, and a new model cant deal with the half baked context, errors, refactoring.
there's an entire cohort on HN who still claim AI is utterly and completely useless despite in your face evidence. Literally people making a similar claim word for word who say that they don't understand the hype that they used AI themselves and it's shit.
Meanwhile my entire company uses AI and the on the ground reality for me versus the cohort above is so much at odds with each other we're both claiming the other side is insane.
I haven't used these bots yet but I want to see the full story. Not just one guys take and one guys personal experience. The hype exists because there are success stories. I want to hear those as well.
I don’t know how you came to that conclusion from my comment. I’m talking about a particular product named OpenClaw, representing a new style of doing work; not AI in general.
I dropped $200 on Claude Max in my personal capacity to test OpenClaw because I use Opus 4.5 all day in Cursor on an enterprise subscription… because it works for those problems.
>I don’t know how you came to that conclusion from my comment. I’m talking about a particular product named OpenClaw, representing a new style of doing work; not AI in general.
Right, I'm saying AI in general is an example of the unreliability of peoples experiences on openclaw. If people are so unreliable about the narrative of AI, I don't trust the narrative of openclaw which on this thread in particular is very negative and in stark contrast to the hype.
>I dropped $200 on Claude Max in my personal capacity to test OpenClaw because I use Opus 4.5 all day in Cursor on an enterprise subscription… because it works for those problems.
The comment wasn't directed at you personally. I'm just saying I want to see counter examples of openclaw succeeding, not just examples of it failing. Frankly on this thread there's Zero success stories which I find sort of strange.
Yep. But on HN, there's a huge cohort of people saying AI is useless.
Everyone sees the downsides but the upside is the one everyone is in denial about. It's like yeah, there's downsides but why is literally everyone using it?
As a rule of thumb, most people who say things like "X is useless and a waste" or "Y is revolutionary and is going to change everything by tomorrow" when the dust hasn't even begun to settle are stupid, overly-excitable, too biased towards negative outlooks, and/or trying to sell you something.
Sometimes they have some good points so you should listen to what they have to say. But that doesn't mean you have to get absorbed into their world view. Just integrate what you see as useful from your current POV and move on.
You’re correct. Any statement by HN users that something is useless has no value because they say that about useful things too.
Moltbot has the shape of the future but doesn’t feel like it to me. Sort of like Langchain once was. Demonstrated some new paradigm shift but is itself flawed so may not be the implementation that lasts. Time will tell.
The only thing here to say is “put it in a VM and try it”. It’s easy to try.
>There's people saying AI isn't living up its hype / valuation, I don't see many saying "utterly useless".
There's more people saying AI doesn't live up to the hype. The people who are saying it's utterly useless is still quite large on HN. It's just that most of them are midway through changing their story because reality is smashing them in the face.
>And there's plenty who worship at the altar of Claude.
I mean who doesn't use it? No one claims it's perfect or a god of code. But if you're not using it you're behind.
Disclaimer: Haven't used any of these (was going to try OpenClaw but found too many issues). I think the biggest value-add is agency. Chat interfaces like Claude/ChatGPT are reactive, but agents can be proactive. They don't need to wait for you to initiate a conversation.
What I've always wanted: a morning briefing that pulls in my calendar (CalDAV), open Todoist items, weather, and relevant news. The first three are trivial API work. The news part is where it gets interesting and more difficult - RSS feeds and news APIs are firehoses. But an LLM that knows your interests could actually filter effectively. E.g., I want tech news but don't care about Android (iPhone user) or MacOS (Linux user). That kind of nuanced filtering is hard to express as traditional rules but trivial for an LLM.
But can't you do the same using appropriate MCP servers with any of the LLM providers? Even just a generic browser MCP is probably enough to do most of these things. And ChatGPT has Tasks that are also proactive/scheduled. Not sure if Claude has something similar.
If all you want to do is schedule a task there are much easier solutions, like a few lines of python, instead of installing something so heavy in a vm that comes with a whole bunch of security nightmares?
> But can't you do the same just using appropriate MCP servers with any of the LLM providers?
Yeah, absolutely. And that was going to be my approach for a personal AI assistant side project. No need to reinvent the wheel writing a Todoist integration when MCPs exist.
The difference is where it runs. ChatGPT Tasks and MCP through the Claude/OpenAI web interfaces run on their infrastructure, which means no access to your local network — your Home Assistant instance, your NAS, your printer. A self-hosted agent on a mac mini or your old laptop can talk to all of that.
But I think the big value-add here might be "disposable automation". You could set up a Home Assistant automation to check the weather and notify you when rain is coming because you're drying clothes on the clothesline outside. That's 5 minutes of config for something you might need once. Telling your AI assistant "hey, I've got laundry on the line. Let me know if rain's coming and remind me to grab the clothes before it gets dark" takes 10 seconds and you never think about it again. The agent has access to weather forecasts, maybe even your smart home weather station in Home Assistant, and it can create a sub-agent, which polls those once every x minutes and pings your phone when it needs to.
I have a few cron jobs that basically are `opencode run` with a context file and it works very well.
At some point OpenClaw will take over in terms of it's benefits but it doesn't feel close yet for the simplicity of just run the job every so often and have OpenCode decide what it needs to do.
Currently it shoots me a notification if my trip to work is likely to be delayed. Could I do it manually well sure.
But this could be done for 1/100 the cost by only delegating the news-filtering part to an LLM API. No reason not to have an LLM write you the code, too! But putting it in front of task scheduling and API fetching — turning those from simple, consistent tasks to expensive, nondeterministic ones — just makes no sense.
Like I said, the first examples are fairly trivial, and you absolutely don't need an LLM for those.
A good agent architecture lets the LLM orchestrate but the actual API calls are deterministic (through tool use / MCPs).
My point was specifically about the news filtering part, which was something I had tried in the past but never managed to solve to my satisfaction.
The agent's job in the end for a morning briefing would be:
- grab weather, calendar, Todoist data using APIs or MCP
- grab news from select sources via RSS or similar, then filter relevant news based on my interests and things it has learned about me
- synthesize the information above
The steps that explicitly require an LLM are the last two. The value is in the personalization through memory and my feedback but also the ability for the LLM to synthesize the information - not just regurgitate it. Here's what I mean: I have a task to mow the lawn on my Todoist scheduled for today, but the weather forecast says it's going to be a bit windy and rain all day. At the end of the briefing, the assistant can proactively offer to move the Todoist task to tomorrow when it will be nicer outside because it knows the forecast. Or it might offer to move it to the day after tomorrow, because it also knows I have to attend my nephew's birthday party tomorrow.
I spun up an Debian stable ec2 vm (using an agent + aws cli + aws-vault of course) to host openclaw, giving it full root access, and I talk to it on discord.
It's a little slow sometimes, but it's the first time I've felt like I have an independent agent that can handle things kind of.
The only two things I did were 1. Ask it to create a Monero address so I could send it money, and have it notify me whenever money is sent to that address. It spun up its own monerod daemon which was really heavy and it ran out of space. So I had to get it to use the Monero wallet instead, but had to manually intervene to shut down the monerod daemon and kill the process and restart openclaw. In the end it worked and still works.
2. I simply asked it "@ me the the silver price every day around 8am ET" and it just figured out how to do it and schedule it. To my understanding it has its own cron functionality using a json file.
3. Write and host some python scripts I can ping externally to send me a notification
I've had it done other misc stuff, but ChatGPT is almost always better for queries, and coding agents + Zed is much better for coding. But with a cheap enough vm and using openrouter plus glm 4.7 or flash, it can do some quirky fun stuff. I see the advantage as mainly having control of a system where it can have long term state (like files, processes, etc) and manage context itself. It is more like glue and it's full mastery and control of a Linux system gives it a lot of flexibility.
Think of it more as agent+os which you aren't getting with raw Claude or ChatGPT.
I've done nothing that interesting with it, it's absolutely a security nightmare, but it's really fun!
One significant advantage over Claude/ChatGPT is that your own agent will be able to access many websites that block cloud-hosted agents via robots.txt and/or IP filters. This is unfortunately getting more common.
Another is that you have access to and control over its memory much more directly, since it's entirely based on text files on your machine. Much less vendor lock-in.
I couldn't really use OpenClaw (it was too slow and buggy), but having an agent that can autonomously do things for you and have the whole context of your life would be massively helpful. It would be like having a personal assistant, and I can see the draw there.
I have no idea. the single thing I can think of is that it can have a memory.. but you can do that with even less code.
Just get a VPS. create a folder and run CC in it, tell it to save things into MD files.
You can access it via your phone using termux.
You could, but Claude Code's memory system works well for specialized tasks like coding - not so much for a general-purpose assistant. It stores everything in flat markdown files, which means you're pulling in the full file regardless of relevance. That costs tokens and dilutes the context the model actually needs.
An embedding-based memory system (letta, mem0, or a self-built PostgreSQL + pgvector setup) lets you retrieve selectively and only grab what's relevant to the current query. Much better fit for anything beyond a narrow use case. Your assistant doesn't need to know your location and address when you're asking it to look up whether sharks are indeed older than trees, but it probably should know where you live when you ask it about the weather, or good Thai restaurants near you.
Yeah, I don't get it either. Deploy a VM that runs an LLM so that I can talk to it via Telegram... I could just talk to it through an app or a web interface. I'm not even trying to be snarky, like what the hell even is the use case?
Okay, but most of the time you can't prompt your AI to successfully debug you out of problems if you don't understand code. Or when you do the AI will solve the problem in a way that creates a dozen more cascading problems an hour later. I've also been coding for 20 years now and I feel like my coding skills are just as important now as they were 10 years ago. Without them I'd never be able to use AI effectively.
The only exception really are greenfield apps like "create a toy todo app demo" or "scaffold this react project" but that's like 0.001% of real world engineering work.
True, but it very much depends on the domain and complexity of stack you're working in. For a lot of crud type dev work the problems are common to many and AI will have no trouble.
Captcha is a completely useless system trivially solved by many agents and services. The only thing captcha does is annoy humans. I do agree with the problem, but I don't know what a solution would look like outside of government identification.
In the six years you are using your computer, do you ever expect to run into versioning issues and conflicts? Homebrew packages conflicting with local packages, something you compile give needs a different python/ruby/node/rust/whatever version that you have locally installed, you want to quickly try out a new package or upgrade without changing your system but have the option of rolling back safely, need to quickly install a database, want to try out a new shell and shell config but don't brick your system and have the option to roll back, etc. Nix gives you all of that and more for a one-time setup cost. Your argument is correct only if you expect to never change anything on your computer for the 6 years. But if I think about how often I have fought with homebrew or some kind of versioning/path/binary conflicts in the past then the investment in nix has paid off exponentially.
It's also about peace of mind like you said. Before nix I sometimes felt anxiety installing or upgrading certain things on my computer. "Will this upgrade break stuff?" - and often it did and I'd have to spend the next few hours debugging. With nix I don't worry about any of that anymore.
> Homebrew packages conflicting with local packages, something you compile give needs a different python/ruby/node/rust/whatever version that you have locally installed, you want to quickly try out a new package or upgrade without changing your system but have the option of rolling back safely, need to quickly install a database, want to try out a new shell and shell config but don't brick your system and have the option to roll back, etc.
Couldn't pretty much all of that be addressed using containers? Keeping your base system clean does sound wonderful, but eg distrobox containers sound more approachable - you're using all the same commands that you normally would, apps are in an environment much closer to what they probably expect. You can still roll back using snapshots, which you can configure to be automatically created on system updates. If you want an atomic rollback guarantee, and a strong reminder not to mess with the base system, you can use an immutable distro (talking about Linux, not macOS here). The one big advantage that I see from nix is reproducibility. But it's not clear how desirable that is for a desktop use case. You may actually want different software on different machines. Having an overview of all the changes you made to your system sounds cool, but I'm not sure it's worth the effort that comes with nix. I'm worried that after 8 months I'll decide it's too much hassle, like many commenters seem to do, and end up switching to a simpler system with dotfiles and containers, wishing I'd done that from the start.
That's mostly solved with env managers for python/ruby/node/..., takes at most a few minutes to fully set up and learn, and doesn't get constantly broken by macOS updates.
Even for things like trying out a new shell you can temporarily move the dotfiles somewhere and restore them back and it still takes less time than converting everything to Nix.
But now you’re stuck with Python. Nix enables trivially simple dev environments that are completely heterogenous. This gives you a powerful form of freedom because it literally opens up the entire software universe to your dev environment in a confidence inspiring way. Not to mention things like parameterising anything you use reliably and setting up environment variables, shell scripts, database service whatever you want. Also integrates with tools such as UV really well. Yes, the language is terse and difficult but once you know it, it’s liberating, and makes you a better software developer in my opinion because you now have a high-end full workshop rather than a small toolbox.
This is my feeling too. Nix is a relatively high time investment for a tool that tries to do everything, when you might not need or want everything and using the specific language’s tooling is more than sufficient and quicker. It takes a few minutes to install and do `uv sync`, or `nvm install`, or whatever, on a repository on a new computer, and it just works. Until Nix gets there, and I’m skeptical it will because of the “purist” mindset a lot of people in the community have, it’s hard to justify it.
I think the comparison is "X-as-code", like with Terraform and other tools.
If you just want a throwaway VM, it's straightforward to create one through the UI cloud console. Whereas, terraform is nevertheless still a useful tool to use to manage VMs.
For stuff like installing development dependencies.. it's maybe not difficult to copy-and-paste instructions from a readme, but solutions like devcontainers or Nix's development shells can be useful even if costing more overhead.
Of course. I wouldn’t say that Nix is a tool without much use or merit, because setting up development environments can be a huge pain and I understand why some people would use it and prefer it.
My biggest complaint is what I mentioned above: it’s trying to be everything for package management, and adds a lot of complexity (and I disagree that it’s always necessary/inherent) compared to just installing a tool and sometimes upgrading it. That complexity often means I have to debug it rather than the tool that I want to - I might have to debug Nix instead of Node, which is not always straightforward. In my limited experience Nix got in my way more than I’d like, and in ways I didn’t expect or want to deal with, and until it’s as seamless as something like Homebrew or apt, it’ll be a hard sell.
> it’s trying to be everything for package management, and adds a lot of complexity (and I disagree that it’s always necessary/inherent) compared to just installing a tool and sometimes upgrading it. That complexity often means I have to debug it rather than the tool that I want to
Although you're right about nix's DX being quite rough, the problem isn't exactly that it "tries to be everything for package management".
Consider the assumption Nix wants to make about its packages: it should be possible to package software by putting it in some arbitrary directory (i.e. not just /usr/bin), where its dependencies are also put in (or symlinked to) some arbitrary location.
I think with well-written software, this should be a reasonable assumption, but you're going to run into friction with that. (Friction which will require you to have a good breadth/depth of understanding).
In my experience, a lot of the complexity when dealing with Nix is with the large and organic complexity of nixpkgs.
The "trying to be everything" is more incidental to the expressive package management. -- NixOS is 'just' an OS built upon the package manager; dev shells are 'just' shells which expose the build environment of a package; etc.
Fully spot on, I don't get what is that hard to set a couple of environment variables, and mayby symbolic links, depending on the OS and language being used.
A simple UNIX script or PowerShell utility takes care of it.
None of the ones I have used during the last decades has ever grown to more than like 20 lines of code, minus comments.
until you need to start combining things. Docker is conceptually a VM the encapsulates everything nicely, but it ironically doesn't "compose" nearly as well as nix flakes or shells. With Nix you start out with a base env and can trivially extend it hierarchically and jump up and down the tree super easily, and without having to roll your own microservice architecture each time just to get stuff to work together.
Docker OTOH composes whole services nicely: if my project needs a redis cache and postgres instance, I don't have to faff about with local ports, traefik can pick up my web server, and so on. I use a flake to create and lock a local development toolchain, but it's no help in managing the services.
One thing I haven't tried yet is building a container from a flake, which would have obvious benefits for reproducibility. Still don't think it would help with service orchestration though.
I think what it comes down to, and where many people get confused, is separating the technology itself from how we use it. The technology itself is incredible for learning new skills, but at the same time it incentivizes people to not learn. Just because you have an LLM doesn't mean you can skip the hard parts of doing textbook exercises and thinking hard about what you are learning. It's a bit similar to passively watching youtube videos. You'd think that having all these amazing university lectures available on youtube makes people learn much faster, but in reality in makes people lazy because they believe they can passively sit there, watch a video, do nothing else, and expect that to replace a classroom education. That's not how humans learn. But it's not because youtube videos or LLMs are bad learning tools, it's because people use them as mental shortcut where they shouldn't.
I fully agree, but to be fair these chatbots hack our reward systems. They present a cost/benefit ratio where for much less effort than doing it ourselves we get a much better result than doing it ourselves (assuming this is a skill not yet learned). I think the analogy to calculators is a good one if you're careful with what you're considering: calculators did indeed make people worse at mental math, yet mental math can indeed be replaced with calculators for most people with no great loss. Chatbots are indeed making people worse at mental... well, everything. Thinking in general. I do not believe that thinking can be replaced with AI for most people with no great loss.
You absolutely need to spend money in PoE to buy stash tabs. It's basically mandatory if you play regularly. The difference to most dark patterns is that the spending has a very low cap. Once you've spent $50 or so on stash tabs you are set forever and never need to spend again. So it's not so different from buying a $50 game, just that you get to try it out for free first.
$50 is exaggerating it nowadays. With async trade you could buy a single merchant tab to gain access to trade (stuff sells pretty quick with async trade!), and maybe a currency and scarab tab for the bare minimum convenience. Around $20 and you've got yourself a meaty beast of a game.
It doesn't feel off to me because that's the exact experience I've had as well. So it's unsurprising to me that many other people share that experience. I'm sure there is a bunch of paid promotion going on for all kinds of stuff on HN (especially what gets onto the front page), but I don't think this is one of those cases.
Oh cool, can you share concrete examples of times codex out performed Claude Code? I’m my experience both tools needs to be carefully massaged with context to fulfill complex task.
I don't really see how examples are useful because you're not going to understand the context. My prompt may be something like "We recently added a new transcription backend api (see recent git commits), integrate it into the service worker. Before implementing, create a detailed plan, ask clarifying questions, and ask for approval before writing code"
Nobody has to give you examples. People can express opinions. If you disagree, that’s fine but requesting entire prompt and response sets is quite demanding. Who are you to be that demanding?
Let's call it the skeptical public? We've been listening to a group of people rave about how revolutionary these tools are, how they're able to perform senior level developer work, how good their code is, and how they're able to work autonomously through the use of sub-agents (i.e. vibe coding), without ever providing evidence that would support any of those grandiose claims.
But then I use these tools myself[1] and I speak to real developers who have used them and our evaluation centers around lukewarm, e.g. good at straightforward, junior level tasks, or good for prototyping, or good for initially generating tests, or good for answering certain types of questions, or good for one-off scripts, but approximately none of them would trust these LLMs to implement a more complex feature like a mid-level or senior developer would without very extensive guidance and hand-holding that takes longer than just doing it ourselves.
Given the overwhelming absence of evidence, the most charitable conclusion I can come to is that the vast majority of people making these claims have simply gone from being 0.2X developers to being 0.3X developers who happen to generate 5X more code per unit of time.
Context engineering is a critical part of being able to use the tool. And it's ok to not understand how to use a new tool. The different models combined with different stacks require different ways of grappling with the technology. And it all changes! It sucks that you've tried it for your stack (Elixir, whatever that is) in your way and it was disappointing.
To me, the tool inherently makes sense and vibes with my own personality. It allows me to write code that I would otherwise procrastinate on. It allows me to turn ideas into reality, so much faster.
Maybe you're just hyper focused on metrics? Productivity, especially when dealing with code, is hard to quanitfy. This is a new paradigm and so it's also hard to compare apples to oranges. Does this help?
So your take is that every real software developer I know is simply bad at using this magical tool that performs on the level of mid-senior level software engineer in the hands of a few chosen ones? But the chosen ones never build anything in public where it can be observed, evaluated, and critiqued. How unfortunate is that?
The people I talked to use a wide variety of environments and their experience is similar across the board, whether they're working in Nodejs, React, Vue, Ruby, PHP, Java, Elixir, or Python.
> Productivity, especially when dealing with code, is hard to quanitfy.
Indeed, that's why I think most people claiming these obscene benefits are really bad at evaluating their own performance and/or started from a really low baseline.
I always think back to a study I read a while ago where people without ADHD were given stimulant medication and reported massive improvements in productivity but objective measurements showed that their real-world performance was equal to, or slightly lower than their baseline.
I think it's very relevant to the psychology behind this AI worship. Some people are being elevated from a low baseline whilst others are imagining the benefits.
People do build in public from vibe-coding, absolutely. This tells me that you have not done your research and just gone off of general guesses or pessimism/frustration from not knowing how to use the tool. The easiest way to be able to find this on Github is to look for where Claude is a contributor. Claude will tag itself in the PR or pushes. Another easy way to that I've seen come up for this is there is a whole "BuildInPublic" tag in the Threads app which has been inundated with Vibe coding. While these might not be in your algorithm, they do exist. You'll be able to see that while there is a lot of crud that there are also products being made are actually versatile, complex, and completely vibe-coded. Most people are not making up these stories. It's very real.
Of course people vibe-code in public - I was clear that I wanted to see evidence of these amazing productivity improvements. If people are building something decent but it takes them 3 or 4 times as long as it would take me, I don't care. That's great for them but it's worthless to me because it's not evidence of a productivity increase.
> there are also products being made are actually versatile, complex, and completely vibe-coded.
Which ones? I'm looking for repositories that are at least partially video-documented to see the author's process in action.
I'm not saying it is, but if ANYTHING was the exact combination of prerequisites to be considered paid promotion on HN, this is the type of comment it would be.
So, let’s see if I get this straight. A highly identifiable person whose company sells a security product is the ideal shill? That doesn’t make any sense whatsoever. On the other hand, someone with a different opinion makes complete sense.
Lebron James endorses KIA. Multi-billion dollar companies can afford and benefit from highly identifiable people so I don't really think that argument makes it any less likely to be an endorsement.
reply