More

comboy · 2026-04-01T12:23:49 1775046229

It's hard to tell how much it says about difficulty of harnessing vs how much it says about difficulty of maintaining a clean and not bloated codebase when coding with AI.

amangsingh · 2026-04-01T12:32:41 1775046761

Why not both? AI writes bloated spaghetti by default. The control plane needs to be human-written and rigid -> at least until the state machine is solid enough to dogfood itself. Then you can safely let the AI enhance the harness from within the sandbox.

whiplash451 · 2026-04-01T13:45:14 1775051114

Were human organizations (not individuals) any good at the latter anyway?

comboy · 2026-04-01T09:05:11 1775034311

I mean, tools change, but I'd be happy to hear if any tool can create that by just saying create "Claude Code Unpack" with nice graphics. or some other single prompt. It likely was an iterative process and it would be lovely if more people started sharing that, because the process itself is also very interesting.

I've created some chinese characters learning website and I took me typing 1/3 of LoTR to get there[1]. I would have typed like 1% of that writing code directly. It is a different process, but it still needs some direction.

1. https://hanzirama.com/making-of

comboy · 2026-03-31T10:03:14 1774951394

As things stand today even when doing research tasks, time spent by model is >> than fetching websites. I don't see it changing any time soon, except when some deals happen behind the scenes where agents get to access CF guarded resources that normally get blocked from automated access.

comboy · 2026-03-28T13:43:19 1774705399

Add CI to check if new laws don't contradict with any existing ones.

bertil · 2026-03-28T13:56:59 1774706219

You might need to turn laws into formal proofs, and the existence of judges makes me think that’s not as likely as you would like. A commenting system would though—trained on countries’s precedents, jurisprudence and traditions might.

whattheheckheck · 2026-03-28T14:00:02 1774706402

Can you imagine rebases with merge conflicts?

bentcorner · 2026-03-28T16:08:11 1774714091

This could in theory already happen without any tech, but I suspect since the government is pretty monolithic, any changes in a specific law are all being done by the same set of people.

You might not have merge conflicts but I imagine you could end up with conflicting guidance from two separate pieces of law (e.g., law A says you must wear green on St. Patrick's day, law B outlaws green pajamas).

comboy · 2026-03-27T16:35:25 1774629325

*that we know of

criddell · 2026-03-27T17:03:52 1774631032

Which is exactly what they said:

> “We are not aware of any successful mercenary spyware attacks against a Lockdown Mode-enabled Apple device,” Apple spokesperson Sarah O’Rourke told TechCrunch on Friday.

ectospheno · 2026-03-27T16:43:03 1774629783

Which is infinitely better than the cases we know about without the feature enabled.

comboy · 2026-03-27T14:40:08 1774622408

Haha, here's some random AI generated content:

    At least 225 judges have ruled in more than 700 cases that the administration's mandatory immigration detention policy likely violates the right to due process[1] The Fifth Amendment's Due Process Clause generally requires those having federal funds cut off to receive notice and an opportunity for a hearing, which was not provided in many of DOGE's spending freezes[2]

(there's more but what's the point)

1. https://www.justsecurity.org/107087/tracker-litigation-legal...

2. https://www.cbpp.org/research/federal-budget/many-trump-admi...

comboy · 2026-03-27T08:23:49 1774599829

Not really related, but does anybody know if somebody's tracking same models performance on some benchmarks over time? Sometimes I feel like I'm being A/B tested.

XCSme · 2026-03-27T08:27:46 1774600066

Oh, I didn't think about this, that's a good idea. I also feel generally model performance changes over time (usually it gets worse).

The problem with doing this is cost. Constsntly testing a lot of models on a large dataset can get really costly.

comboy · 2026-03-27T09:54:20 1774605260

Yeah, good tests are associated with cost. I'd like to see benchmarks on big messy codebases and how models perform on a clearly defined task that's easy to verify.

I was thinking that tokens spent in such case could also be an interesting measure, but some agent can do small useful refactoring. Although prompt could specify to do the minimal change required to achieve the goal.

comboy · 2026-03-27T07:59:46 1774598386

People are loading huge interpreted environments for stuff that can be done from the command line. Run computations on complex objects where it could be a single machine instruction etc. The trend has been around for a long time.

comboy · 2026-03-24T11:05:06 1774350306

Wow /insights is genuinely useful, perhaps CLI should be pushing that as a tip, if one has enough sessions, instead of keep nagging me about the frontend developer skill which I already have installed

In general CLI could be more reliable and responsive though, it's a text based env yet sometimes feel like running windows 95 on 386dx

It seems clear from the insights that some model is marking failure cases when things went wrong and likely reporting home, so that should be extremely valuable to Anthropic

heap_perms · 2026-03-24T13:32:52 1774359172

> it's a text based env yet sometimes feel like running windows 95 on 386dx

They use nodejs and React. Yes, for real.

https://xcancel.com/trq212/status/2014051501786931427

pacoWebConsult · 2026-03-24T15:23:50 1774365830

Claude Code uses Bun. Anthropic acquired Bun in December. Bun is an alternative node runtime.

heap_perms · 2026-03-25T14:34:36 1774449276

Apologies, the nodejs comment above therefore is wrong. I don't seem to be able to edit it anymore.

comboy · 2026-03-24T15:13:08 1774365188

lol, yeah

> We’ve rewritten Claude Code’s terminal rendering system to reduce flickering by roughly 85%.

tells you all you need to know

and I keep running it remotely through tmux, that explains so many things

edit: if they are writing it in react anyway (sic!) maybe we could at least get a web interface, skipping mapping it to terminal output part ..

comboy · 2026-03-23T11:53:00 1774266780

I think stability and reliability have vastly improved over the last years in general (not necessarily talking about gh specifically)

It's just that everybody is using 100 tools and dependencies which themselves depend on 50 others to be working.