More

IanCal · 2026-04-17T06:02:55 1776405775

Systems have been caught out that review pull requests, that’s a simple and clear one. The more obvious to me for most people is anything you do that interacts with your email without an explicit approve list of emails to read.

planb · 2026-04-17T14:45:42 1776437142

Yes, but none of this applies to the local codex agent that runs when I tell it to and has access to my computer. Like: „scan this folder of PDFs and create an excel file with all expenses. Then enter them into my tax software.“ This needs access to very sensitive data and involves a quite complex handling of data. But the only attack vector I see is someone injecting prompts into my invoice files.

IanCal · 2026-04-14T09:04:44 1776157484

The overall speed rather than TTFT might start to be more relevant as the caller moves from being a human to another model.

However quality is really important. I tried that site and clicked one of their examples, "create a javascript animation". Fast response, but while it starts like this

``` Below is a self‑contained HTML + CSS + JavaScript example that creates a simple, smooth animation: a colorful ball bounces around the browser window while leaving a fading trail behind it.

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>JavaScript Bounce Animation</title> <style> body, html { margin: 0; padding: 0;

```

the answer then degrades to

``` radius: BALL_RADIUS, color: BALL_COLOR, traivD O] // array of previous {x,y} positions }; ```

Then more things start creeping in

``` // 3⃣ Bounce off walls if (ball.G 0 ball.radius < 0 || ball.x + ball.radius > _7{nas.width) { ball.vx *= -1; ibSl.x = Math.max(ball.radius, Math.min(ball.x, canvbbF4idth - ball.radius)); } if

```

and the more it goes on the worse it gets

``` Ho7 J3 Works 0 Atep | Description | ```

and

``` • prwrZ8}E6on 5 jdF wVuJg Ar touc> 2ysteners ,2 Ppawn \?) balls w>SFu the 8b$] cliM#]9 ```

This is for the demo on the front page, so I expect this is a pretty good outcome compared to what else you might ask.

cataflutter · 2026-04-14T09:38:23 1776159503

Weird; I clicked through out of curiosity and didn't get any corruption of the sort in the end result.

I also asked it some technical details about how diffusion LLMs could work and it provided grammatically-correct plausible answers in a very short time (I don't know the tech to say if it's correct or not).

RugnirViking · 2026-04-15T07:42:16 1776238936

I got the exact same thing. But trying out another few prompts I couldn't get it to happen again. I wonder if its a bug with the cahcing/website? I can't imagine they actually run interference each time you use one of the sample prompts?

nl · 2026-04-14T12:06:57 1776168417

Mercury 2 is better than that in my testing, but it does have trouble with tool calling.

IanCal · 2026-04-13T20:30:39 1776112239

These are vastly different scales though. “If North Korea wanted to, they could spend a lot of money and get into your system” is wildly different to “anyone with a few bucks who can ask ‘please find an exploit for Y’ can get in”

40four · 2026-04-13T21:58:16 1776117496

To be fair, the recent Axios supply chain attack was North Korea based, and probably cost them very little money. So it illustrates that you don’t have to “spend a lot of money” to get into our systems.

IanCal · 2026-04-13T08:46:42 1776070002

You’re not getting a worthwhile sla on a subscription at this rate. What are you going to get? A few dollars? An sla isn’t useful unless it actually bites for the provider and actually compensates the customer. And it costs money - how much are you willing to spend for this insurance?

IanCal · 2026-04-12T19:36:51 1776022611

Do you mean L? ml to me would be millilitres and one fluid ounce is ~30ml.

AdmiralAsshat · 2026-04-12T19:53:39 1776023619

Yes, typo on my side. Thanks for catching!

pizzafeelsright · 2026-04-13T13:57:04 1776088624

This is why normal folk use quarts and gallons so there is no confusion. An mg vs ng of astrophage is deadly.

IanCal · 2026-04-10T08:29:31 1775809771

I think it’s always good to dig a bit deeper on these things.

This seems ridiculous to you, compared to a very obvious win with a Lego sorting vacuum.

Lego isn’t niche, and the explanation isn’t a weird technical thing that only experts would get and understand how important or valuable it is.

Yet it’s not being done.

Is there nobody who has realised this gap but you? Has nobody managed to convince people with money that it’s worthwhile? Have you tried but failed?

Or is it not many many thousands of people who are wrong but you?

Is the problem harder than you think? I’ve worked with robotics but not for a long time and I think the core manipulation is either not really solved or not until recently. I don’t know about yours but my kids also don’t fully dismantle their Lego creations either so would the robot need to take them apart too? That’s a lot of force. And some are special.

How people want Lego sorted is pretty broad. Kids don’t even need it sorted that much. And the volume can be huge for smaller buckets of things.

Is the market not as big as you think? Is it big enough for the cost, I’d buy one for £100 but £1000? £10,000?

How does it compare for most people against having the kids play on a blanket and then tipping it into a bucket? Or those ones that are a circle of cloth with a drawstring so it’s a play area and storage all in one? I 3d printed some sieves and that’s most of the issue right there done.

People are solving actual problems, but lots of problems are hard, and not all of them are profitable.

As a gut feeling, there is such a large overlap of engineers and large Lego collections and willingness to spend lots of money and time saving some time sorting Lego that the small number of implementations usually split over many years is very telling about the difficulty.

For what it’s worth I want this too.

IanCal · 2026-04-09T06:57:22 1775717842

Case studies are done with consent, typically. That’s pretty different.

sigmoid10 · 2026-04-09T07:30:14 1775719814

In principle, anonymized case studies do not require consent and historically, they were often published without. Without personally identifiable information, this is and always has been 100% legal. But in modern practice, many journals acknowledge that making a case fully anonymous in the age of the internet might not even be possible without taking away everything noteworthy, so they require some form of consent nowadays.

armchairhacker · 2026-04-09T07:35:41 1775720141

Alternatively they can do what Scott Alexander did and change irrelevant details.

sigmoid10 · 2026-04-09T07:48:08 1775720888

That's not so easy, especially for clinical case studies. If any data points are irrelevant, they should not be stated at all, because they actually might not be irrelevant after all and by arbitrarily changing them, you could confound results. On the other hand, it has been shown that three or more indirect data points can already be enough to unmask you in an anonymized report. And most reports usually contain many more than that. So it's not surprising that journals would cover their backs by requiring consent, even if the law does not explicitly demand it.

ghaff · 2026-04-09T10:09:53 1775729393

It’s been known since at least the 90s that it’s really hard to fully anonymize patient records. You can’t be certain but you can infer probabilities from very little information.

ghaff · 2026-04-09T20:21:08 1775766068

For anyone who disagrees with this statement there’s been a lot of research done in the area.

armchairhacker · 2026-04-09T07:15:17 1775718917

I don’t know how typical it is, but HIPAA explicitly doesn’t cover patient data after anonymization, and anecdotally I’ve had an anonymous case study published about me without my consent (although I was notified after).

IanCal · 2026-04-08T20:57:06 1775681826

> Today’s backyard AI looks like AI. It is not AI.

Getting real tired of people new to AI thinking only recent LLMs are AI somehow. BoW was a pretty solid technique and that only requires you to learn how to count to one.

mrbungie · 2026-04-08T20:58:49 1775681929

We can thank our AI overlords like sama and damodei for that.

IanCal · 2026-04-07T07:25:22 1775546722

> I don't understand how any human in good faith could look at Iran's government and say they are the evil regime

You seem to be trying to force reality into a “good vs evil” storyline. There does not have to be a good side.

IanCal · 2026-04-04T19:32:51 1775331171

Can you explain the benefits over something like openrouter?

jrandolf · 2026-04-04T19:41:08 1775331668

24/7 LLM for $10/month.

johndough · 2026-04-04T21:28:36 1775338116

Isn't this a bad deal? Or is there an error in my math?

For $40, I'd get 20 tok/s * 2.6M seconds per month = 52M tokens of DeepSeek v3.2 per month if I run it 24/7, which is not realistic for most workloads.

On OpenRouter [1], $40 buys 105M tokens from the same model, which is more than 52M tokens, and I can freely choose when to use them.

[1]: https://openrouter.ai/deepseek/deepseek-v3.2

jrandolf · 2026-04-04T21:35:12 1775338512

20 tok/s is an average. It can be more, it can be less. If you are running off-peak I'm sure you'd get some crazy number.

KMnO4 · 2026-04-05T14:05:26 1775397926

That doesn’t matter when you have the average. Even if you are somehow able to get 10000tok/s during off peak times, by virtue of how averages work, you’re still only getting 52M tokens per month (as calculated above).

gravypod · 2026-04-05T03:46:53 1775360813

Why wouldn't developers just do llm arbitrage against openrouter if it is a better deal?

jrandolf · 2026-04-05T03:56:31 1775361391

The problem is different. OpenRouter is a router to LLMs. It doesn't solve GPU underutilization.

gravypod · 2026-04-05T05:11:51 1775365911

What I am saying is if your system lets me pay $x/token and open router lets me pay $y/token if x<y then someone could make money just by providing those tokens through the open router API. That would either drive up demand for your systems increasing costs or drive up supply on open router decreasing costs. Eventually the costs would converge, no?

victorbjorklund · 2026-04-05T07:05:35 1775372735

For the same reason people don’t do server arbitrage because Hetzner is cheaper than AWS.