Last week I got together with my math alumni friend. We cracked some beers, we chatted with voice mode ChatGPT and toyed around with Collatz Conjecture and we sent some prompt to a coding agent to build visualizations and simulation. It was a lot of fun directing these agents while we bounced off ideas and the models could explore them.
I think with the right problem and the right agentic loop it’s clear to me improvements will speed up.
The bigger problem for me is that the realtime voice modes lack tool use, so they can't look anything up or do anything. Model strength definitely also matters, but even dumb models can be helpful when they can look things up and try things out. And smart models that don't do those things kinda suck.
I've been using Codex to build a repo that pulls down astronomical datasets and runs simulation to try to find explanation for the hubble tension. Having an agent to do the tedious bits and also having an LLM to bounce ideas has tough me so much about astronomy. I don't have serious hopes of finding anything new and novel but it's still a lot of fun.
I’ve noticed that LLMs can effortlessly read minified JS. How does it do with obfuscated binary code? I wonder if the days of obfuscation are numbered when the tedious job of de-obfuscation can be automated.
I get all my groceries deliver to my doorstep via Walmart delivery pass. The thing I'm really missing is having AI curate meal planning to my family's preferences. I already feed ChatGPT my family' preferences (e.g. Kid A doesn't eat X Y Z and liked meal A B C, kid B likes ...) and ChatGPT is helping me build meal plans. With my preferences we can quickly nail down a meal plan for the week.
The slowest part of my meal planning is going through Walmart's slow site where each page load is 2-3 seconds and it takes several page load per item. Once it can translate my meal plan into a grocery checkout from Walmart I'm all set.
I'd love to see the results of that. I think calling a single prompt iteration lifeless misses the point. It's like looking at a game that has had a few hours of development and saying it's bad. Games need iterations. Seeing your results as the first iteration is impressive. I can see follow-up prompts and custom tweaking get really good results!
Last summer I built a factorio-like automation game with older models and over time the game really started to take life.
It's very useful to understand what you're struggling from even if it's not curable. It explains your symptoms, your experience and help you understand what you're going through. Understanding that you're suffering from something incurable is also helpful in not looking for other ineffective methods to cure a mysterious illness.
> SpaceX has deorbiting assets on top of depreciating ones
The deorbiting part is redundant. Their satellite are just that, a depreciating asset. Their lifetime seem to be 5 to 7 years. The important claim is if the total cost, including the launch, can be recuperate over that lifetime or not.
You can make one, the balatro bench is open source. But I'm quite sure it'd be crazily expensive for a hobby project. At the end of the day, LLM can't actually 'practice and learn.'
I've gotten pretty good results by prompting "What did you struggle on? Please update the instructions in <PROMPT/SKILL>" and "Here's your conversation <PASTE>, please see what you struggled with and update <PROMPT/SKILL>".
It's hit or miss, but I've been able to have it self improve on prompts. It can spot mistakes and retain things that didn't work. Similar to how I learned games like Balatro. Playing Balatro blind, you wouldn't know which jokers are coming and have synergy together, or that X strategy is hard to pull off, or that you can retain a card to block it from appearing in shops.
If the LLM can self discover that, and build prompt files that gradually allow it to win at the highest stake, that's an interesting result. And I'd love to know which models do best at that.
I also continue to work because I enjoy it. And that will let me pass on this gift to my children.
reply