More

Bombthecat · 2026-05-07T09:07:16 1778144836

Who cares about Memphis! We need more AI bro! /s

Bombthecat · 2026-05-07T09:03:32 1778144612

You already realised that you aren't paid to review code manually. Why waste the time? And maybe even get the wrath of your management by "wasting" time?

Bombthecat · 2026-05-06T19:32:29 1778095949

BYD has to me become an icon of German decline vs Chinese expansion.

My view.

I was looking at a new car. Went into several car shops, VW, Skoda, Toyota and BYD.

And all of them were basically empty and BYD was FULL! Like really really full.

And the sales guy confirmed it, they are selling cars like crazy.

electriclove · 2026-05-06T20:03:48 1778097828

Which country was this in?

Bombthecat · 2026-05-06T20:52:15 1778100735

Well.. Germany :)

titanomachy · 2026-05-06T20:29:09 1778099349

Must be EU somewhere, never seen a Skoda anywhere else.

michaelhoney · 2026-05-07T02:43:48 1778121828

they're reasonably popular in Australia. They have done a good job of marketing their station wagons/estates to the road-biking community

Bombthecat · 2026-05-07T09:00:52 1778144452

The enyaq / elroq is a crazy good car for your money

Bombthecat · 2026-05-06T12:26:55 1778070415

Agree!

I play destiny in a clan, most of them are from UK. I don't understand a single word from some of them...

Bombthecat · 2026-05-03T07:36:28 1777793788

Those open weight providers where found nerfing models too.

Bombthecat · 2026-05-02T14:44:51 1777733091

It's cheap. That's all.

Bombthecat · 2026-04-26T18:41:13 1777228873

Both of them look pretty old?

cjsaltlake · 2026-04-26T18:42:21 1777228941

code clash I think would be quite hard to game or contaminate unintentionally; considering that models need to compete against one another

gertlabs · 2026-04-26T19:05:24 1777230324

https://gertlabs.com already does this at scale.

An industry-standard benchmark shouldn't be hosted or designed by a lab producing the models, regardless.

Bombthecat · 2026-04-26T18:53:42 1777229622

I mean the data / benchmarks

Bombthecat · 2026-04-26T17:54:42 1777226082

It's more like no one cares about UX. People keep using the product and they keep printing. Why invest in a UX researcher or designer?

Bombthecat · 2026-04-24T07:04:24 1777014264

Google stated a while back, that with tpus they are able to sell at cost / with profit.

Aka: everyone who uses Nvidia isn't selling at cost, because Nvidia is so expensive.

Bombthecat · 2026-04-24T07:02:31 1777014151

In six month deepseek won't be sota anymore und usage will be wayyyy down.

randomgermanguy · 2026-04-24T09:47:10 1777024030

Only comparing on SOTA scores (ignoring price etc.) is like choosing your daily-driver by looking at who makes the fastest sports-car...

LinXitoW · 2026-04-24T10:34:24 1777026864

The constant improvements of SOTA are the main thing keeping the investment machine running. We can't really remove training costs from inference costs, because a bunch of the funding and loans for the inference hardware only exists because the promises the continuous training (tries to) provides.

dnnddidiej · 2026-04-24T09:57:17 1777024637

Not really. SOTA vs non SOTA is "can I get my coding work actually done today" vs. "this can do customer support chat"

It is like car vs. kick scooter.

regularfry · 2026-04-24T11:02:11 1777028531

It really isn't. We get coding work actually done today on Opus 4.5. That's not SOTA any more, and anything proximate to that level, even quite loosely, is genuinely useful.

dnnddidiej · 2026-04-24T11:07:31 1777028851

OK we are in Opus 4.5 is not SOTA. Right by that definition .... yes you are right.

randomgermanguy · 2026-04-24T11:47:54 1777031274

I mean its almost halve a year, i think that counts ?

dnnddidiej · 2026-04-24T23:14:18 1777072458

Time wise you are correct.

randomgermanguy · 2026-04-24T11:54:20 1777031660

> "can I get my coding work actually done today" vs. "this can do customer support chat"

I think you need to define "can get coding work done" for this to make sense. Ive been using GPT-3 back-then for basic scripts, does that count ? Or only Claude-Code ?

I also think this is a false dichotomy, if you look at the Project Vend project or Vending-Bench, customer support etc. is at no means trivial. (Old but great story https://www.businessinsider.com/car-dealership-chevrolet-cha...)

UlisesAC4 · 2026-04-24T17:32:42 1777051962

This, I have been doing my side hustle code with open code an 3.2 reasoner and it is way better than what I have at day job with copilot and whatever models are there.

wahnfrieden · 2026-04-25T04:50:43 1777092643

Copilot is a bad harness that perverts the productivity of models like GPT 5.5.

dnnddidiej · 2026-04-24T23:15:08 1777072508

Tell me more please!

zrn900 · 2026-04-26T23:46:31 1777247191

Not really. The current SOTAs are already at the point that they can do that. The following models will start to surpass the daily work level. It's a diminishing returns situation just like anything else in tech.

2ndorderthought · 2026-04-24T10:59:46 1777028386

A huge proportion of those scores are gamed anyways. Use whatever works for you at the price and availability you can afford

Palmik · 2026-04-24T09:58:26 1777024706

Or there will be DSv4.1/2/3 ;)

randomgermanguy · 2026-04-24T11:57:07 1777031827

Definitely something in this realm, they call the models "preview" at a bunch of different points in the paper.

What im really hoping is for a double-punch like with V3 -> R1

Barbing · 2026-04-24T07:43:03 1777016583

Well, if they distilled once…