More

bgirard · 2026-05-01T15:51:42 1777650702

I struggle with this too. But I remind myself that I'm buying myself one of the most expensive and valuable gift: freedom and independence.

I also continue to work because I enjoy it. And that will let me pass on this gift to my children.

bgirard · 2026-04-14T04:24:37 1776140677

Last week I got together with my math alumni friend. We cracked some beers, we chatted with voice mode ChatGPT and toyed around with Collatz Conjecture and we sent some prompt to a coding agent to build visualizations and simulation. It was a lot of fun directing these agents while we bounced off ideas and the models could explore them.

I think with the right problem and the right agentic loop it’s clear to me improvements will speed up.

drakenot · 2026-04-14T04:45:24 1776141924

I think voice mode uses weaker models, just an FYI relative to the SOTA

pxc · 2026-04-14T16:04:52 1776182692

The bigger problem for me is that the realtime voice modes lack tool use, so they can't look anything up or do anything. Model strength definitely also matters, but even dumb models can be helpful when they can look things up and try things out. And smart models that don't do those things kinda suck.

SOLAR_FIELDS · 2026-04-14T12:13:34 1776168814

Can get around this with a local STT model and use text input but UX is probably clunkier

scrollop · 2026-04-14T05:41:24 1776145284

Definitely, seems like gpt 3

bgirard · 2026-04-08T19:59:18 1775678358

I've been using Codex to build a repo that pulls down astronomical datasets and runs simulation to try to find explanation for the hubble tension. Having an agent to do the tedious bits and also having an LLM to bounce ideas has tough me so much about astronomy. I don't have serious hopes of finding anything new and novel but it's still a lot of fun.

bgirard · 2026-04-08T13:22:02 1775654522

I’ve noticed that LLMs can effortlessly read minified JS. How does it do with obfuscated binary code? I wonder if the days of obfuscation are numbered when the tedious job of de-obfuscation can be automated.

bgirard · 2026-03-28T06:28:56 1774679336

Subpoenas and whistleblowing are pretty good tools.

bgirard · 2026-03-23T15:09:33 1774278573

I think they're thinking about this wrong.

I get all my groceries deliver to my doorstep via Walmart delivery pass. The thing I'm really missing is having AI curate meal planning to my family's preferences. I already feed ChatGPT my family' preferences (e.g. Kid A doesn't eat X Y Z and liked meal A B C, kid B likes ...) and ChatGPT is helping me build meal plans. With my preferences we can quickly nail down a meal plan for the week.

The slowest part of my meal planning is going through Walmart's slow site where each page load is 2-3 seconds and it takes several page load per item. Once it can translate my meal plan into a grocery checkout from Walmart I'm all set.

rawbot · 2026-03-23T15:21:12 1774279272

So we need a Walmart MCP then...

bgirard · 2026-03-16T22:13:53 1773699233

I'd love to see the results of that. I think calling a single prompt iteration lifeless misses the point. It's like looking at a game that has had a few hours of development and saying it's bad. Games need iterations. Seeing your results as the first iteration is impressive. I can see follow-up prompts and custom tweaking get really good results!

Last summer I built a factorio-like automation game with older models and over time the game really started to take life.

bgirard · 2026-02-24T03:55:47 1771905347

It's very useful to understand what you're struggling from even if it's not curable. It explains your symptoms, your experience and help you understand what you're going through. Understanding that you're suffering from something incurable is also helpful in not looking for other ineffective methods to cure a mysterious illness.

bgirard · 2026-02-20T17:19:10 1771607950

> SpaceX has deorbiting assets on top of depreciating ones

The deorbiting part is redundant. Their satellite are just that, a depreciating asset. Their lifetime seem to be 5 to 7 years. The important claim is if the total cost, including the launch, can be recuperate over that lifetime or not.

bgirard · 2026-02-13T15:53:11 1770997991

Are there benchmarks if we allow the LLM to practice and study the game?

raincole · 2026-02-13T16:57:24 1771001844

You can make one, the balatro bench is open source. But I'm quite sure it'd be crazily expensive for a hobby project. At the end of the day, LLM can't actually 'practice and learn.'

bgirard · 2026-02-13T20:48:54 1771015734

I've gotten pretty good results by prompting "What did you struggle on? Please update the instructions in <PROMPT/SKILL>" and "Here's your conversation <PASTE>, please see what you struggled with and update <PROMPT/SKILL>".

It's hit or miss, but I've been able to have it self improve on prompts. It can spot mistakes and retain things that didn't work. Similar to how I learned games like Balatro. Playing Balatro blind, you wouldn't know which jokers are coming and have synergy together, or that X strategy is hard to pull off, or that you can retain a card to block it from appearing in shops.

If the LLM can self discover that, and build prompt files that gradually allow it to win at the highest stake, that's an interesting result. And I'd love to know which models do best at that.