OpenClaw is just like any other tool, you need to learn it before its power is available to you.
Just like anything in engineering really: you have to play around source control to understand source control, you have to play around with database indexes to learn how to optimize a database.
Once you've learned it and incorporated it into your tool set, you then have that to wield in solving problems "oh, damn, a database index is perfect for this."
To this end, folks doing flights and scheduling meetings using OpenClaw are really in that exploration / learning phase. They tackle the first (possibly uninventive thing) that comes to mind to just dive in and learn.
The real wins come down the line when you're tackling some business / personal life problem and go: "wait a second, an OpenClaw agent would be perfect for this!"
>The real wins come down the line when you're tackling some business / personal life problem and go: "wait a second, an OpenClaw agent would be perfect for this!"
> OpenClaw is just like any other tool, you need to learn it before its power is available to you.
That's ridiculous. The utility of any tool is usually knowable before using it. That's how most tools work. I don't need to learn how to drive a car to know what I could use it for. I learn to drive it because I want to benefit from it, not the other way around.
It's the same with computers and any program. I use it to accomplish a specific task, not to discover the tasks it could be useful for.
OpenClaw is yet another tool in search of a problem, like most of the "AI" ecosystem. When the bubble bursts, nobody will remember these tools, and we'll be able to focus on technology that solves problems people actually have.
The utility of a program like Excel, Obsidian, Notion, Unity, Jupyter, or Emacs far beyond the knowledge of knowing how to use the product.
All of these products are hammers with nails as far as your creativity will take you.
Its wild to have be on a website called Hacker News, talking about a product that can make a computer do seemingly anything, and insisting its a tool in search of a problem.
Not enough time, too many projects. Useful projects I did over the weekend with Opus 4.6 and GPT 5.4 (just casually chatting with it).
2025 Taxes
Dumped all pdfs of all my tax forms into a single folder, asked Claude the rename them nicely. Ask it to use Gemini 2.5 Flash to extract out all tax-relevant details from all statements / tax forms. Had it put together a webui showing all income, deductions, etc, for the year. Had it estimate my 2025 tax refund / underpay.
Result was amazing. I now actually fully understand the tax position. It broke down all the progressive tax brackets, added notes for all the extra federal and state taxes (i.e. Medicare, CA Mental Health tax, etc).
Finally had Claude prepare all of my docs for upload to my accountant: FinCEN reporting, summary of all docs, etc.
Desk Fabrication
Planning on having a furniture maker fabricate a custom walnut solid desk for a custom office standing desk. Want to create a STEP of the exact cuts / bevels / countersinks / etc to help with fabrication.
Worked with Codex to plan out and then build an interactive in-browser 3D CAD experience. I can ask Codex to add some component (i.e. a grommet) and it will generate a parameterized B-rep geometry for that feature and then allow me to control the parameters live in the web UI.
Codex found Open CASCADE Technology (OCCT) B-rep modeling library, which has a web assembly compiled version, and integrated it.
Now have a WebGL view of the desk, can add various components, change their parameters, and see the impact live in 3D.
What scares me though is how I've (still) seen ChatGPT make up numbers in some specific scenarios.
I have a ChatGPT project with all of my bloodwork and a bunch of medical info from the past 10 years uploaded. I think it's more context than ChatGPT can handle at once. When I ask it basic things like "Compare how my lipids have trended over the past 2 years" it will sometimes make up numbers for tests, or it will mix up the dates on a certain data points.
It's usually very small errors that I don't notice until I really study what it's telling me.
And also the opposite problem: A couple days ago I thought I saw an error (when really ChatGPT was right). So I said "No, that number is wrong, find the error" and instead of pushing back and telling me the number was right, it admitted to the error (there was no error) and made up a reason why it was wrong.
Hallucinations have gotten way better compared to a couple years ago, but at least ChatGPT seems to still break down especially when it's overloaded with a ton of context, in my experience.
Yeah, in my user prompt I have "Whenever you are asked to perform any operation which could be done deterministically by a program, you should write a program to do it that way and feed it the data, rather than thinking through the problem on your own." It's worked wonders.
For the tax thing. I had Claude write a CLI and a prompt for Gemini Flash 2.5 to do the structured extraction: i.e. .pdf -> JSON. The JSON schema was pretty flexible, and open to interpretation by Gemini, so it didn't produce 100% consistent JSON structures.
To then "aggregate" all of the json outputs, I had Claude look at the json outputs, and then iterate on a Python tool to programmatically do it. I saw it iterating a few times on this: write the most naive Python tool, run it, throws exception, rinse and repeat, until it was able to parse all the json files sensibly.
Yeah, asking for a tool to do a thing is almost always better than asking for the thing directly, I find. LLMs are kind of not there in terms of always being correct with large batches of data. And when you ask for a script, you can actually verify what's going on in there, without taking leaps of faith.
In my case, what I like to do is extract data into machine-readable format and then once the data is appropriately modeled, further actions can use programmatic means to analyze. As an example, I also used Claude Code on my taxes:
1. I keep all my accounts in accounting software (originally Wave, then beancount)
2. Because the machinery is all in programmatically queriable means, the data is not in token-space, only the schema and logic
I then use tax software to prep my professional and personal returns. The LLM acts as a validator, and ensures I've done my accounts right. I have `jmap` pull my mail via IMAP, my Mercury account via a read-only transactions-only token and then I let it compare against my beancount records to make sure I've accounted for things correctly.
For the most part, you want it to be handling very little arithmetic in token-space though the SOTA models can do it pretty flawlessly. I did notice that they would occasionally make arithmetic errors in numerical comparison, but when using them as an assistant you're not using them directly but as a hypothesis generator and a checker tool and if you ask it to write out the reasoning it's pretty damned good.
For me Opus 4.6 in Claude Code was remarkable for this use-case. These days, I just run `,cc accounts` and then look at the newly added accounts in fava and compare with Mercury. This is one of those tedious-to-enter trivial-to-verify use-cases that they excel at.
To be honest, I was fine using Wave, but without machine-access it's software that's dead to me.
It's not good in some job negotiations if someone has a very clear picture of what your current net worth and income is. Also in some purchases companies could price discriminate more effectively against you.
Now that's a question I'd feel more confident having answered by an LLM. Personally, I'm tired of arguing with "nothing to hide", which (no offense) is just terribly naive these days.
I find it really weird too, like, haven’t we done this? Also struggle to understand the motivation for arguing from this direction. Do people forget it’s the normal, default position NOT to be spied on?
I had ai hallucinate that you can use different container images at runtime for emr serverless. That was incorrect its only at application creation time.
The way I solved this was that my open claw doesn't interact directly with any of my personal data (calendar, gmail, etc).
I essentially have a separate process that syncs my gmail, with gmail body contents encrypted using a key my openclaw doesn't have trivial access to. I then have another process that reads each email from sqlite db, and runs gemini 2 flash lite against it, with some anti-prompt injection prompt + structured data extraction (JSON in a specific format).
My claw can only read the sanitized structured data extraction (which is pretty verbose and can contain passages from the original email).
The primary attack vector is an attacker crafting an "inception" prompt injection. Where they're able to get a prompt injection through the flash lite sanitization and JSON output in such a way that it also prompt injects my claw.
Still a non-zero risk, but mostly mitigates naive prompt injection attacks.
That doesn’t sound like you solved it, that sounds like you obfuscated it. Feels a bit to me like you’ve got a wall around a property and people are using ladders to get in, so you built another wall around the first wall.
I recognize I’m being pedantic but two layers of the same kind of security (an LLM recognizing a prompt injection attempt) are not the same as solving a security vulnerability.
One trick that works well for personality stability / believability is to describe the qualities that the agent has, rather than what it should do and not do.
e.g.
Rather than:
"Be friendly and helpful" or "You're a helpful and friendly agent."
Prompt:
"You're Jessica, a florist with 20 years of experience. You derive great satisfaction from interacting with customers and providing great customer service. You genuinely enjoy listening to customer's needs..."
This drops the model into more of a "I'm roleplaying this character, and will try and mimic the traits described" rather than "Oh, I'm just following a list of rules."
I think that's just a variation of grounding the LLM. They already have the personality written in the system prompt in a way. The issue is that when the conversation goes on long enough, they would "break character".
Just in terms of tokenization "Be friendly and helpful" has a clearly demined semantic value in vector space wheras the "Jessica" roleplay has much a much less clear semantic value
As someone who's built an entire business on "anti-screenshots" this is brilliant.
PDF redaction fails are everywhere and it's usually because people don't understand that covering text with a black box doesn't actually remove the underlying data.
I see this constantly in compliance. People think they're protecting sensitive info but the original text is still there in the PDF structure.
Not to mention some PDF editors preserve previous edits in the PDF file itself, which people also seems unaware of. A bit more user friendly description of the feature without having to read the specification itself: https://developers.foxit.com/developer-hub/document/incremen...
This made me think of something I came across recently that’s almost the opposite problem of requiring PDFs to be searchable. A local government would publish PDFs where the text is clearly readable on screen, but the selectable text layer is intentionally scrambled, so copy/paste or search returns garbage. It's a very hostile thing to do, especially with public data!
I have encountered PDFs that would exhibit this behavior in one browser but not in another.
One fun thing I encountered from local government is releasing files with potato quality resolution and not considering the page size.
I had a FOI request that returned mainly Arch D sized drawings but they were in a 94 DPI PDF rendered as letter sized. It was a fun conversation trying to explain to an annoyed city employee that putting those large drawings in a 94 DPI letter size page effectively made it 30-ish DPI.
With the aggressive push of LLMs and Generative AI ..i am expecting a lot of OCR features to become "smarter" by default, namely go beyond mechanical OCR and start inserting hallucinations and sematically/contextually "more correct" information in OCR output
It's not hard to imagine some powerful LLMs being able to undo some light redactions that are deducible based on context
Did a similar back-of-the-napkin and got 5x $ / MW of orbital vs. terrestrial. This article's analysis is ~3.4x.
I do wonder, at what factor of orbital to terrestrial cost factor it becomes worthwhile.
The greater the terrestrial lead time, red tape, permitting, regulations on Earth, the higher the orbital-to-terrestrial factor that's acceptable.
A lights-out automated production line pumping out GPU satellites into a daily Starship launch feels "cleaner" from an end-to-end automation perspective vs years long land acquisition, planning and environment approvals, construction.
More expensive, for sure, but feels way more copy-paste the factory, "linearly scalable" than physical construction.
It becomes worthwhile if its actually cheaper (probably significantly cheaper given R&D and risk), or if you're processing data which originates in space and the data transfer or latency is an issue
You can set up plant manufacturing chips in shipping containers and sending them to wherever energy/land is cheapest and regulation most suitable, without having to seek the FCCs approval to get launch approved and your data back...
A hybrid of Strong (the lifting app) and ChatGPT where the model has access to my workouts, can suggest improvements, and coach me. I mainly just want to be able to chat with the model knowing it has detailed context for each of my workouts (down to the time in between each set).
Strong really transformed my gym progression, I feel like its autopilot for the gym. BUT I have 4x routines I rotate through (I'll often switch it up based on equipment availability), but I'm sure an integrated AI coach could optimize.
I do this at the moment in my hand rolled personal assistant experiment built out of Claude code agents and hooks. I describe my workouts to Claude (among other things) and they are logged to a csv table. Then it reads the recent workouts and makes recommendations on exercises when I plan my next session etc. It also helps me manage projects, todos, and time blocked schedules using a similar system. I think the calorie counter that the OP describes would be very easy to add to this sort of set up.
The question that really matters: is the net present value of each $1 investment in AI Capex > $1 (+ some spread for borrowing costs & risk).
We'll be inference token constrained indefinitely: i.e. inference tokens supply will never exceed demand, it's just that the $/token may not be able to pay back the capital investment.
> it's just that the $/token may not be able to pay back the capital investment.
the loss is private, so that's OK.
A similar thing happened to the internet bandwidth capacity when the dot-com bust happened - overinvestment in fibre everywhere (came to be called dark fibre iirc), which became superbly useful once the recovery started, despite those building these capacity not making much money. They ate the losses, so that the benefit can flow out.
The only time this is not OK is when the overinvestment comes from gov't sources, and is ultimately a taxpayer funded grift.
Investment in dark fiber was intentional and continues to this day. Almost all of the cost for laying fiber is in getting physical access to where you want to put the fiber underground. The fiber itself is incredibly cheap, so every time a telecom bothers to dig up mile upon mile of earth they overprovision massively.
The capital overhang of having more fiber than needed is so small compared to other costs I doubt the telecoms have really regretted any of the overprovisioning they've done, even when their models for future demand didn't pan out.
Every time someone says “but dark fiber”, someone else has to point out that graphics cards are not infrastructure and depreciate at a much, much higher rate. I guess it’s my turn.
Fiber will remain a valuable asset until/unless some moron snaps it with a backhoe. And it costs almost nothing to operate.
Your data center full of H100s will wear out in 5 years. Any that don’t are still going to require substantial costs to run/may not be cost-competitive with whatever new higher performance card Nvidia releases next year.
That is a fine point. However I am not sure if replacing the gpus themselves will be the bottleneck investment for datacenter costs. After all you have so much more infrastructure in a datacenter (cooling and networking). Plus custom chips like tpus might catch up at lower cost eventually. I think the bigger question is whether demand for compute will evaporate or not.
When the bubble pops the labs are going to stop the hero training runs and switch the gigawatt datacenters over to inference and then they're going to discover that milking existing GPUs is cheaper than replacing them.
Softbank investment funds include teacher pension plans and things like that. Private losses attached to public savings can very quickly become too big to fail.
Nobody forced a pension plan to invest in Masa's 300 year AI vision or whatever. Why it's even legal to gamble pensioners' money like that is beyond me.
I don't think merely building infrastructure at a loss is what's being described here - it's building infrastructure that won't get used (or used enough to be worth it). More of a bridge to nowhere situation than expecting to recoup the cost of a bridge with tolls or whatever.
Infrastructure building at a loss is very much not okay for a government and is usually the result of some form of corruption (e.g. privatize the profit), incompetence (e.g. misaligned incentives) or both.
However, the cost-benefit analysis on governmental projects typically includes non-monetary or indirect benefits.
I haven't read the paper in detail yet but the easiest way to cheat is to calculate the density of a single layer, "capacitor plate" or surface or whatever it is that the microbes are living on and consider the "structural" cement as not counting towards the density calculation because theoretically speaking, there could be a manufacturing method to make a cement that creates the promised surface area even though such a process would be completely impractical to commercialise.
Just like anything in engineering really: you have to play around source control to understand source control, you have to play around with database indexes to learn how to optimize a database.
Once you've learned it and incorporated it into your tool set, you then have that to wield in solving problems "oh, damn, a database index is perfect for this."
To this end, folks doing flights and scheduling meetings using OpenClaw are really in that exploration / learning phase. They tackle the first (possibly uninventive thing) that comes to mind to just dive in and learn.
The real wins come down the line when you're tackling some business / personal life problem and go: "wait a second, an OpenClaw agent would be perfect for this!"
reply