More

jameson · 2026-04-23T19:21:24 1776972084

> "In combination with other prompt changes, it hurt coding quality, and was reverted on April 20"

Do researchers know correlation between various aspects of a prompt and the response?

LLM, to me at least, appears to be a wildly random function that it's difficult to rely on. Traditional systems have structured inputs and outputs, and we can know how a system returned the output. This doesn't appear to be the case for LLM where inputs and outputs are any texts.

Anecdotally, I had a difficult time working with open source models at a social media firm, and something as simple as wrapping the example of JSON structure with ```, adding a newline or wording I used wildly changed accuracy.

jameson · 2026-04-22T16:13:51 1776874431

What competitive advantage does OpenAI/Anthropic has when companies like Qwen/Minimax/etc are open sourcing models that shows similar (yet below than OpenAI/Anthropic) benchmark results?

Also, the token prices of these open source models are at a fraction of Anthropic's Opus 4.6[1]

[1]: https://artificialanalysis.ai/models/#pricing

fnordpiglet · 2026-04-22T16:25:21 1776875121

For coding often quality at the margin is crucial even at a premium. It’s not the same as cranking out spam emails or HN posts at scale. This is why the marginal difference between your median engineer and your P99 engineer is comp is substantial, while the marginal comp difference between your median pick and packer vs your P99 pick and packer isn’t.

I’d also say it keeps the frontier shops competitive while costing R&D in the present is beneficial to them in forcing them to make a better and better product especially in value add space.

Finally, particularly for Anthropic, they are going for the more trustworthy shop. Even ali is hosting pay frontier models for service revenue, but if you’re not a Chinese shop, would you really host your production code development workload on a Chinese hosted provider? OpenAI is sketchy enough but even there I have a marginal confidence they aren’t just wholesale mining data for trade secrets - even if they are using it for model training. Anthropic I slightly trust more. Hence the premium. No one really believes at face value a Chinese hosted firm isn’t mass trolling every competitive advantage possible and handing back to the government and other cross competitive firms - even if they aren’t the historical precedent is so well established and known that everyone prices it in.

ozgrakkurt · 2026-04-22T19:33:00 1776886380

I just assume any of those companies would steal my work and wouldn't care about it.

Everything they have done so far indicates this.

Running your own is the only option unless you really trust them or unless you have the option to sue them like some big companies can.

Or if you don't really care then you can use the chineese one since it is cheaper.

What makes you trust Anthropic more than Alibaba?

fnordpiglet · 2026-04-22T21:08:44 1776892124

There’s a difference between stealing for model training and direct monitoring of actionable trade secrets and corporate espionage. Anthropic and OpenAI wouldn’t do this simply because they would be litigated out of existence and criminally investigated if they did. In China it’s an expected part of the corporate and legal structure with virtually no recourse for a foreign firm and when it’s in states interest domestic either. I’m surprised you don’t realize the US has fairly strong civil, criminal, and regulatory protections in place for theft of actionable material and reuse of corporate and trade secrets, let alone copyright materials. I assure you their ToS also do not allow them to do this and that in itself is a contractual obligation you can enforce and win in court.

trvz · 2026-04-22T21:40:29 1776894029

Anthropic already admitted to heavily monitoring user requests to protect against distillation. They have everything in place, turning on learning from user data would literally be just a couple lines of code at this point. Anyone trusting them not to do it is a fool.

anon373839 · 2026-04-22T22:51:48 1776898308

Absolutely. Plus as these companies become hungrier for revenue and to get out of the commodity market they are in, they are only going to get more aggressive in their (ab)use of customer data.

Zetaphor · 2026-04-23T02:18:33 1776910713

How exactly do you propose that a local weights model that I can run without an internet connection is going to exfiltrate my trade secrets to the Chinese government?

fnordpiglet · 2026-04-23T03:48:19 1776916099

If you read I’m talking about their service only models.

dTal · 2026-04-23T09:16:47 1776935807

Why? No one else was. The discussion was about OpenAI / Anthropic's lack of moat when there are open weights models that are almost as good. You can host them anywhere you like. Pay a US company to do so if you want.

bigbadfeline · 2026-04-22T17:11:17 1776877877

> For coding often quality at the margin is crucial even at a premium

That's a cryptic way to say "Only for vibe-coding quality at the margin matters". Obviously, quality is determined first and foremost by the skills of the human operating the LLM.

> No one really believes at face value a Chinese hosted firm isn’t mass trolling every competitive advantage possible

That's much easier to believe than the same but applied to a huge global corp that operates in your own market and has both the power and the desire to eat your market share for breakfast, before the markets open, so "growth" can be reported the same day.

Besides, open models are hosted by many small providers in the US too, you don't have to use foreign providers per se.

fnordpiglet · 2026-04-22T17:27:44 1776878864

1) model provider choices don’t obviate the need to make other good choices

2) I think there is a special case for Chinese providers due to the philosophical differences in what constitutes fair markets and the regulatory and civil legal structure outside China generally makes such things existentially dangerous to do; hence while it might happen it is extraordinarily ill advised, while in China is implicitly the way things work. However my point is Ali has their own hosted version of Qwen models operating on the frontier that are at minimum hosted exclusively before released. Theres no reason to believe they won’t at some point exclusively host some frontier or fine tuned variants for purposes for commercial reasons. This is part of why they had recent turnover.

rohansood15 · 2026-04-22T16:35:48 1776875748

Most code is not P99 though.

Also, have you considered that your trust in Anthropic and distrust in China may not be shared by many outside the US? There's a reason why Huawei is the largest supplier of 5G hardware globally.

runjake · 2026-04-22T16:39:30 1776875970

You're right, but perspective is important, and that's because China and the US are engaged in economic warfare (even before the current US regime), vying for the dubious title of "superpower".

fnordpiglet · 2026-04-22T17:30:17 1776879017

I find it hard to believe anyone who has ever done business inside China doesn’t know that the structure of Chinese business is built around massive IP theft and repurposing on a state wide systematic level. It’s not a nationalism point, it’s an objective and easily verified truth.

Most code is not P99, but companies pay a premium to produce code that is. That’s my point.

Zetaphor · 2026-04-23T02:22:50 1776910970

I'll ask you the same thing I asked the other guy. How is a an open weights model that I can run on my own hardware without an internet connection going to exfiltrate my trade secrets to the Chinese government?

dTal · 2026-04-23T09:13:53 1776935633

It's the same user and they already answered you: "If you read I’m talking about their service only models."

But yes this is a non-sequitor. The original question was "What competitive advantage does OpenAI/Anthropic has when companies like Qwen/Minimax/etc are open sourcing models that shows similar (yet below than OpenAI/Anthropic) benchmark results?"

Even if you don't trust Chinese companies, and you want a hosted model, you can always pay a third party to host a Chinese open weight model. And it'll be a lot cheaper than OpenAI.

rohansood15 · 2026-04-23T05:32:03 1776922323

Chinese companies are built on IP theft, and Anthropic/Open AI are not?

And in world where code generation costs are trending to zero, goodluck commanding a premium to produce any kind of code.

There is a whole bunch of P99 code that is open-source. What makes code P99 is not the model that produces it, but the people who verify/validate/direct it.

piperswe · 2026-04-24T14:40:39 1777041639

Didn’t the major American labs pirate a whole bunch of their training data?

otabdeveloper4 · 2026-04-22T17:03:29 1776877409

> For coding often quality at the margin is crucial even at a premium.

For coding, quality is not measurable and is based entirely on feels (er, sorry, "vibes").

Employers paying for SOTA models is nothing but a lifestyle status perk for employees, like ping-pong tables or fancy lunch snacks.

fnordpiglet · 2026-04-22T18:13:21 1776881601

I’m building my own company and I consider model choice crucial to my marginal ability to produce a higher quality product I don’t regret having built. Every higher end dev shop I’ve worked at over the last few years perceives things the same. There are measurable outcomes from software built well and software not, even if the code itself isn’t easily measurable. I would rather pay a few thousand more per year for a better overall outcome with less developer struggle against bad model decisions than end up with an inferior end product and have expensive developer spin wheels containing a dumb as a brick model. But everyone’s career experiences are different and I’d feel sad to work at a place where SOTA is a lifestyle choice rather than a rational engineering and business choice.

otabdeveloper4 · 2026-04-23T05:22:09 1776921729

"Rational engineering and business choice" and "AI" are two words that do not go together.

Wait five years and come back. Right now AI is 100% FOMO and lifestyle signaling and nothing more.

j-bos · 2026-04-22T19:42:54 1776886974

"based entirely on feels"

Now there's a word I haven't heard in a long, long time.

OtomotO · 2026-04-22T20:14:49 1776888889

> but if you’re not a Chinese shop, would you really host your production code development workload on a Chinese hosted provider?

As opposed to an US-american shop? Yup, sure, why not? It's the same ballpark.

donmcronald · 2026-04-22T16:52:22 1776876742

Given the very limited experience I have where I've been trying out a few different models, the quality of the context I can build seems to be much more of an issue than the model itself.

If I build a super high quality context for something I'm really good at, I can get great results. If I'm trying to learn something new and have it help me, it's very hit and miss. I can see where the frontier models would be useful for the latter, but they don't seem to make as much difference for the former, at least in my experience.

The biggest issue I have is that if I don't know a topic, my inquiries seem to poison the context. For some reason, my questions are treated like fact. I've also seen the same behavior with Claude getting information from the web. Specifically, I had it take a question about a possible workaround from a bug report and present it as a de-facto solution to my problem. I'm talking disconnect a remote site from the internet levels of wrong.

From what I've seen, I think the future value is in context engineering. I think the value is going to come from systems and tools that let experts "train" a context, which is really just a search problem IMO, and a marketplace or standard for sharing that context building knowledge.

The cynic in me thinks that things like cornering the RAM market are more about depriving everyone else than needing the resources. Whoever usurps the most high quality context from those P99 engineers is going to have a better product because they have better inputs. They don't want to let anyone catch up because the whole thing has properties similar to network effects. The "best" model, even if it's really just the best tooling and context engineering, is going to attract the best users which will improve the model.

It makes me wonder of the self reinforced learning is really just context theft.

jameson · 2026-04-23T19:23:42 1776972222

Apologies for my ignorance but how can you know the quality of the context?

swiftcoder · 2026-04-22T20:18:20 1776889100

> For coding often quality at the margin is crucial even at a premium

For some problems, sure, and when you are stuck, throwing tokens at Opus is worthwhile.

On the other hand, a $10/month minimax 2.7 coding subscription that literally never runs out of tokens will happily perform most day-to-day coding tasks

solenoid0937 · 2026-04-23T05:16:30 1776921390

"Literally never runs out of tokens?" lol, no. Tokens are just energy. There is always a way to run out of tokens, and no one will subsidize free tokens forever.

swiftcoder · 2026-04-23T07:48:56 1776930536

"Never runs out of tokens" in the sense that running 8 hours a day 7 days a week is still under the subscription limit

solenoid0937 · 2026-04-23T16:39:59 1776962399

You can also do that on an API without hitting a limit!

swiftcoder · 2026-04-23T17:38:39 1776965919

Not typically at predictable monthly spend, which turns out to be important to some folks

pistoriusp · 2026-04-23T07:21:56 1776928916

if you run it at home then the sun is a pretty good way to get "free energy."

sumedh · 2026-04-22T21:57:51 1776895071

Why pay for two subscriptions though?

Claude also has other models which use less tokens.

swiftcoder · 2026-04-23T17:39:39 1776965979

Redundancy, mostly. And having left over tokens when Opus eats all of those tokens

AJ007 · 2026-04-22T16:33:27 1776875607

Not sure how your last point matters if 27b can run on consumer hardware, besides being hosted by any company which the user could certainly trust more than anthropic.

OpenAI & Anthropic are just lying to everyone right now because if they can't raise enough money they are dead. Intelligence is a commodity, the semiconductor supply chain is not.

datadrivenangel · 2026-04-22T17:51:47 1776880307

The challenge is token speed. I did some local coding yesterday with qwen3.6 35b and getting 10-40 tokens per second means that the wall time is much longer. 20 tokens per second is a bit over a thousand tokens per minute, which is slower than the the experience you get with Claude Code or the opus models.

Slower and worse is still useful, but not as good in two important dimensions.

fnordpiglet · 2026-04-22T18:16:35 1776881795

Also benchmark measures are not empirical experience measures and are well gamed. As other commenters have said the actual observed behavior is inferior, so it’s not just speed.

It’s ludicrous to believe a small parameter count model will out perform a well made high parameter count model. That’s just magical thinking. We’ve not empirically observed any flattening of the scaling laws, and there’s no reason to believe the scrappy and smart qwen team has discovered P=NP, FTL, or the magical non linear parameter count scaling model.

dTal · 2026-04-23T09:35:19 1776936919

Ooh, car analogy time!

It's kinda like saying a car with a 6L engine will always outperform a car with a 2L engine. There are so many different engineering tradeoffs, so many different things to optimize for, so many different metrics for "performance", that while it's broadly true, it doesn't mean you'll always prefer the 6L car. Maybe you care about running costs! Maybe you'd rather own a smaller car than rent a bigger one. Maybe the 2L car is just better engineered. Maybe you work in food delivery in a dense city and what you actually need is a 50cc moped, because agility and latency are more important than performance at the margins.

And if you're the only game in town, and you only sell 6L behemoths, and some upstart comes along and starts selling nippy little 2L utility vehicles (or worse - giving them away!) you should absolutely be worried about your lunch. Note that this literally happened to the US car industry when Japanese imports started becoming popular in the 80s...

anon373839 · 2026-04-22T22:56:53 1776898613

This is just blind belief. The model discussed in this topic already outperforms “well made” frontier LLMs of 12-18 months ago. If what you wrote is true, that wouldn’t have been possible.

datadrivenangel · 2026-04-22T23:26:23 1776900383

It's amazing that we can run models better than state of the art ~36 months ago on local consumer devices!

rmacqueen · 2026-04-22T18:39:16 1776883156

> This is why the marginal difference between your median engineer and your P99 engineer is comp is substantial, while the marginal comp difference between your median pick and packer vs your P99 pick and packer isn’t.

That's an interesting analogy.

ginko · 2026-04-23T09:24:38 1776936278

>but if you’re not a Chinese shop, would you really host your production code development workload on a Chinese hosted provider?

The point of open source models is that you host them locally. I trust neither Chinese nor American providers with this.

andriy_koval · 2026-04-23T16:57:40 1776963460

another point is that there could be multiple inference providers, so market will be healthier, and not dominated by one player who charges NN% margin.

DiogenesKynikos · 2026-04-23T01:05:45 1776906345

Are you claiming that major Chinese cloud providers like Tencent and Alibaba are pilfering trade secrets from their customers' data? To my knowledge, there's no evidence for that whatsoever. If it were true and came out, it would instantly tank their cloud businesses (which is why they don't do it, and why AWS, Azure, etc. also don't do it).

If it were to happen, Chinese law does offer recourse, including to foreign firms. It's not as if China doesn't have IP law. It has actually made a major effort over the last 10+ years to set up specialized courts just to deal with IP disputes, and I think foreign firms have a fairly good track record of winning cases.

> No one really believes at face value

This says a lot more about the prejudices and stereotypes in the West about China than it does about China itself.

Zetaphor · 2026-04-23T02:29:01 1776911341

In every one of these threads for a new Chinese open weights model, it's always the same tired discussion of how this is all actually a psyop by the Chinese government to undermine US interests and it can't answer questions about Tienanmen Square.

Meanwhile I'm over here solving real world business problems with a model that I can securely run on-prem and not pay out the nose for cloud GPU inference. And then after work I use that same model to power my personal experiments and hobby projects.

There are no Chinese labs with different financial and political motivations, there's only "China" the monolith. The last thread for Qwen's new hosted model was full of folks talking about how "China" is no longer releasing open weights models, when the next day Moonshot AI releases Kimi 2.6. A few days later and here's Qwen again with another open release.

For some reason this country gets what I assume are otherwise smart Americans to just completely shut off their brains and start repeating rhetoric.

andriy_koval · 2026-04-23T17:02:31 1776963751

> The last thread for Qwen's new hosted model was full of folks talking about how "China" is no longer releasing open weights models, when the next day Moonshot AI releases Kimi 2.6. A few days later and here's Qwen again with another open release.

looks like you declared win argument, because you now see that 2.6 was released, but at that time your opponents argument stand.

Also, you can't predict if Chinese labs will continue releasing open frontier models. Looks like Kimi is the only one left, Qwen is much smaller model.

Zetaphor · 2026-04-24T16:40:20 1777048820

> looks like you declared win argument, because you now see that 2.6 was released, but at that time your opponents argument stand.

Their argument was based entirely on speculation, but stated as a matter of fact, despite Alibaba making very clear statements that they were going to continue releasing open models.

And the core of my argument is that they were conflating a single company with the motivations of multiple companies in a country. Nobody talks about US companies by saying "The Americans are going to do X", they say "OpenAI/Anthropic/Google is going to do X".

Aurornis · 2026-04-22T16:16:04 1776874564

I use Opus and the Qwen models. The gap between them is much larger than the benchmark charts show.

If you want to compare to a hosted model, look toward the GLM hosted model. It’s closest to the big players right now. They were selling it at very low prices but have started raising the price recently.

mchusma · 2026-04-22T16:45:05 1776876305

I like both GLM and Kimmi 2.6 but honestly for me they didn’t have quite the cost advantage that I would like partly because they use more tokens so they end up being maybe sonnet level intelligence at haiku level cost. Good but not quite as extreme as some people would make them out to be and for my use cases running the much cheaper, Gemma 4 four things where I don’t need Max intelligence and running sonnet or opus for things where I need the intelligence and I can’t really make the trade-off is been generally good and it just doesn’t seem worth it to cost cut a little bit. Plus when you combine prompt, cashing and sub agents using Gemma 4, the cost to run sonnet or even opus, are not that extreme.

For coding $200 month plan is such a good value from anthropic it’s not even worth considering anything else except for up time issues

But competition is great. I hope to see Anthropic put out a competitor in the 1/3 to 1/5 of haiku pricing range and bump haiku’s performance should be closer to sonnet level and close the gap here.

syntaxing · 2026-04-22T16:42:01 1776876121

Yes and no. Are you using open router or local? Are the models are good as Opus? No. But 99% of the time, local models are terrible because of user errors. Especially true for MoE, even though the perplexity only drops minimal for Q4 and q4_0 for the KV cache, the models get noticeably worse.

acidtechno303 · 2026-04-22T16:59:54 1776877194

Sounds like you're accusing a professional of holding their tool incorrectly. Not impossible, but not likely either.

syntaxing · 2026-04-22T17:16:02 1776878162

Inferencing is straight up hard. I’m not accusing them of anything. There’s a crap ton of variables that can go into running a local model. No one runs them at native FP8/FP16 because we cannot afford to. Sometimes llama cpp implementation has a bug (happens all the time). Sometimes the template is wrong. Sometimes the user forgot to expand the context length to above the 4096 default. Sometimes they use quantization that nerfs the model. You get the point. The biggest downside of local LLMs is that it’s hard to get right. It’s such a big problem, Kimi just rolled out a new tool so vendors can be qualified. Even on openrouter, one vendor can be half the “performance” of the other.

Frannky · 2026-04-22T16:23:30 1776875010

If these results are because of vampire attacks, the results will stop being so good when closed ones figure out how to pollute them when they are sucking answers.

Also, they are not exactly as good when you use them in your daily flow; maybe for shallow reasoning but not for coding and more difficult stuff. Or at least I haven't found an open one as good as closed ones; I would love to, if you have some cool settings, please share

mmmore · 2026-04-22T16:56:25 1776876985

The token prices being high for Opus undermines your argument, because it shows people are willing to pay more for the model.

The thing is the new OpenAI/Anthropic models are noticeably better than open source. Open source is not unusable, but the frontier is definitely better and likely will remain so. With SWE time costing over $1/min, if a convo costs me $10 but saves me 10 minutes it's probably worth it. And with code, often the time saved by marginally better quality is significant.

oliveiracwb · 2026-04-22T22:50:21 1776898221

There is no advantage at this moment. But there will be once one of the ecosystems consolidates.

jstummbillig · 2026-04-22T17:45:32 1776879932

> yet below than OpenAI/Anthropic

This is the competitive advantage. Being better.

jameson · 2026-04-16T15:04:19 1776351859

How should one compare benchmark results? For example, SWE-bench Pro improved ~11% compared with Opus 4.6. Should one interpret it as 4.7 is able to solve more difficult problems? or 11% less hallucinations?

HarHarVeryFunny · 2026-04-16T15:33:33 1776353613

Benchmarks are meaningless. Try it on your own problems and see if it has improved for what you want to use it for.

azeirah · 2026-04-16T15:11:19 1776352279

There is no hallucination benchmark currently.

I was researching how to predict hallucinations using the literature (fastowski et al, 2025) (cecere et al, 2025) and the general-ish situation is that there are ways to introspect model certainty levels by probing it from the outside to get the same certainty metric that you _would_ have gotten if the model was trained as a bayesian model, ie, it knows what it knows and it knows what it doesn't know.

This significantly improves claim-level false-positive rates (which is measured with the AUARC metric, ie, abstention rates; ie have the model shut up when it is actually uncertain).

This would be great to include as a metric in benchmarks because right now the benchmark just says "it solves x% of benchmarks", whereas the real question real-world developers care about is "it solves x% of benchmarks *reliably*" AND "It creates false positives on y% of the time".

So the answer to your question, we don't know. It might be a cherry picked result, it might be fewer hallucinations (better metacognition) it might be capability to solve more difficult problems (better intelligence).

The benchmarks don't make this explicit.

zeroonetwothree · 2026-04-16T15:10:49 1776352249

Benchmark results don’t directly translate to actual real world improvement. So we might guess it’s somewhat better but hard to say exactly in what way

theptip · 2026-04-16T15:14:36 1776352476

11% further along the particular bell curve of SWE-bench. Not really easy to extrapolate to real world, especially given that eg the Chinese models tend to heavily train on the benchmarks. But a 10% bump with the same model should equate to “feels noticeably smarter”.

A more quantifiable eval would be METR’s task time - it’s the duration of tasks that the model can complete on average 50% of the time, we’ll have to wait to see where 4.7 lands on this one.

jameson · 2026-04-16T15:03:54 1776351834

How should one compare benchmark results?

For example, SWE-bench Pro improved ~11% compared with Opus 4.6. Should one interpret it as 4.7 is able to solve more difficult problems? or 11% less hallucinations?

jameson · 2026-04-16T06:35:37 1776321337

Curious why the team choose Grafana Mirmir over VM cluster?

esafak · 2026-04-16T13:38:32 1776346712

How are these substitutes? Mimir is a time series database.

edit: I understood virtual machine :)

igor47 · 2026-04-16T14:41:03 1776350463

So is Victoria metrics?

jameson · 2026-04-16T02:50:08 1776307808

Used it for a few days to summarize top 10 hacker news at scheduled time or send me a joke of the day

I liked how easy I can tell it to do something for me but token usage didn't justify the cost. I 'd either had to use smarter model which could cost a lot more or cheaper model which, in one instance, stuck in a loop

Product-wise, it's an awesome tool. Imagine having your own butler for anything except that the reliability with affordable isn't here yet to do anything serious

jameson · 2026-04-13T04:27:37 1776054457

I'm noticing a fair number of degradation of Claude infrastructure recently and makes me wonder why they can't use Claude to identify or fix these issues in advance?

It seems a counter intuitive to Anthropic's message that Claude uncovered bugs in open source project*.

[*] https://www.anthropic.com/news/mozilla-firefox-security

alex_duf · 2026-04-13T13:01:25 1776085285

timing wise this seems to match the Claude Mythos story.

So maybe they're trying to free-up some GPU capacity to run audit of projects in need? I'm assuming Mythos is not cheap to run.

The cache TTL story is also probably link to the RAM price going up like mad so they're trying to save on future expenditure here maybe?

I do understand why people are pissed though

jameson · 2026-04-06T20:19:12 1775506752

Vibe coders' argument* is that quality of code does not matter because LLMs can iterate much much faster then humans do.

Consider this overly simplified process of writing a logic to satisfy a requirement:

1. Write code

2. Verify

3. Fix

We, humans, know the cost of each step is high, so we come up various way to improve code quality and reduce cognitive burden. We make it easier to understand when we have to revisit.

On the other hand, LLMs can understand** a large piece of code quickly***, and in addition, compile and run with agentic tools like Claude Code at the cost of token****. Quality does not matter to vibe coders if LLMs can fill the function logic that satisfies the requirement by iterating the aforementioned steps quickly.

I don't agree with this approach and have seen too many things broken from vibe code, but perhaps they are right as LLMs get better.

* Anecdotal

** I see LLM as just a probabilistic function so it doesn't "reason" like humans do. It's capable of highly advanced problem solving yet it also fails at primitive task.

*** Relative to human

**** Cost of token I believe is relatively cheaper compared to a full-time engineer and it'll get cheaper over time.

parineum · 2026-04-07T01:29:15 1775525355

> ** Cost of token I believe is relatively cheaper compared to a full-time engineer and it'll get cheaper over time.

I don't know how true this is going to be, at least in the short term. The big providers are likely running at a loss and, as models have gotten better, they've also crept up in price as well.

They/you are counting on them hitting a point where it is actually cheap for the value provided (after they take some off the top) but I don't see that as inevitable before these companies go under or pivot into much more specialized tools for big clients.

It's not clear to me that AI code is cheaper than human code (of equal functionality).

jameson · 2026-03-27T23:06:49 1774652809

I wonder how many others are hacked but remain undiscovered

longislandguido · 2026-03-27T23:40:26 1774654826

Considering 95% of spam that hits my inbox originates from compromised Gmail accounts, I'd say it's a few.

Because Google is too big to fail, all Gmail traffic is essentially whitelisted and they can't be bothered to do anything about it.

detourdog · 2026-03-27T23:52:45 1774655565

Almost all phishing attempts at my domain are from google. Many Norton subscription bills for around $350. I report every single one to google. I can’t believe they aren’t using there AI to figure this out.

mcmcmc · 2026-03-28T01:06:14 1774659974

> I can’t believe they aren’t using there AI to figure this out.

Why would they burn compute on it when they have zero incentive to fix the problem?

themafia · 2026-03-28T02:48:19 1774666099

Meanwhile have a complaint volume of more than 0.1% and they'll consider you extremely suspicious and start actively interfering with your deliveries.

Then you get into the forgotten early 2000s era google "postmaster tools" to try to poke through the chicken entrails to divine the nature of your issue.

gzread · 2026-03-28T01:14:46 1774660486

Google was banned from Usenet once, so there's hope. Every single provider was so fed up with spam they just blocked the whole network.

jameson · 2026-03-11T04:50:26 1773204626

Roblox is slot machines for kids

Games are filled with loot boxes that drop exquisite items on chance. It's a repeated cycle of charging robux only to spend on another slot machine.

US regulation is far behind protecting children from such scheme. Japan disallows many forms of such loot boxes due to addictive nature.

russelg · 2026-03-11T14:19:30 1773238770

>Japan disallows many forms of such loot boxes due to addictive nature.

Crazy to say this when they've basically pioneered and perfected gacha games.

rincebrain · 2026-03-11T20:36:55 1773261415

Both things are true, though.

The fact that gacha games are so popular is _why_ they had enough attention to explicitly ban the most toxic patterns at the time. [1]

There's an interesting question of how far to push the bans, though - "in theory" your goals should probably be "don't let people prey on addictive behaviors" and "minimize people impulse buying more than they can afford", but the latter especially is...very hard to make an empirical rule for, and then you get into logistics like people just making additional accounts to get around it...

[1] - http://www.vg247.com/2012/05/18/kompu-gacha-freemium-systems...

rincebrain · 2026-03-11T20:48:33 1773262113

I would, I think, probably argue that the problem is less that they're gambling and more that they involve actual money.

I think exposing people to addictive mechanics with guard rails is probably useful for teaching you how you respond to them, before you go to Vegas and blow far more than you budgeted.

In particular, I don't think you're going to ban addictive things faster than people can build them, and I know you can't rely on parents having conversations with kids, so I feel like all you can do is try to remove the whirring buzzsaw of real money incentives and let people learn that it's sharp, but foam sword sharp, where you can't ruin your life permanently (easily) with it.

wj · 2026-03-11T09:53:48 1773222828

I have called them casinos for kids elsewhere due to the bright colors, flashing text, and money counters going up, up, up on the screen.

And also because my kids can spend tens of dollars in minutes on it.

Aeolun · 2026-03-11T04:54:52 1773204892

It’s not that bad. Sure, all the games allow you to spend money to rapidly get better, but the core gameplay loops work just fine without it.

bluefirebrand · 2026-03-11T05:12:15 1773205935

Anecdotally, I have 3 nieces and 2 nephews and only the oldest has avoided Roblox

Every Christmas and birthday now is "I want Robux" and they are actively annoyed if they get anything else

This thing is bad for children

Aeolun · 2026-03-11T06:19:14 1773209954

Pretty sure I was annoyed when I got anything but computer games. Which, I’m pretty certain, adults told me were bad for me.

pnt12 · 2026-03-11T07:27:44 1773214064

Just because two things are "annoying", doesn't mean they have the same ethical problems.

The fun single player games only need to convince you they are a fun experience and you should buy them once.

Games with loot boxes are trying to convince you every day to spend money on them. Dunno about roblox, but often the items are visible, and "defaults" are often perceived as poor or noobs.

We can't be naive: It's a whole other level and companies are spending millions on manipulating kids to spending more and more money.

Ferret7446 · 2026-03-12T01:50:05 1773280205

Of course a kid is gonna be annoyed if they get something other than what they want, but, to use a hamfisted but scarily apt analogy, are the kids yearning for the drug called sugar or the drug called crack/meth/etc. Both are "bad" but on completely different levels

sayamqazi · 2026-03-11T07:12:30 1773213150

You guys were getting presents?

loloquwowndueo · 2026-03-11T12:10:04 1773231004

Dunno man, when the “core gameplay loop” gets interrupted every 2 minutes with “do you want to pay to win?” Banners and 50% of the screen is covered in ads and trap buttons that pop up a purchase dialog when you press them accidentally (a given on mobile with touch controls), it’s fairly obvious the gameplay loop is the last thing in the developers mind.

jameson · 2026-03-11T06:10:16 1773209416

> games allow you to spend money to rapidly get better

Audience is the problem here. It's obviously not a big deal if the platform is targeted for adults, but majority of users are underage. The platform can certainly implement guardrails for the vulnerable users if they wish to

Aeolun · 2026-03-11T06:16:56 1773209816

Those guardrails exist. They’re called parents. My son doesn’t have a credit card and therefore doesn’t have robux. Having no robux, he can’t spend it on anything.

grvdrm · 2026-03-11T12:24:30 1773231870

Interesting. You don’t think adults have the same problem?

Ferret7446 · 2026-03-12T01:46:40 1773280000

I can't speak for GP, but at least adults are responsible for their own spending problems

dogleash · 2026-03-11T14:54:07 1773240847

> the core gameplay loops work just fine without it.

Of course it's functional, it has to string people along for enough time to get them to start paying.

That doesn't mean grinding a system tuned to get you hooked enough to give up and pay is a 'just fine' as a game. It's openly deliberate malicious design.