More

selcuka · 2026-03-06T00:02:20 1772755340

With Anthropic you always have 3 models to choose from: Opus-latest, Sonnet-latest, and Haiku-latest, from the best/slowest to the worst/fastest.

The version numbers are mostly irrelevant as afaik price per token doesn't change between versions.

maxo99 · 2026-03-06T00:18:35 1772756315

Three random names isn't ideal. I'm often need to double check which is which. This is why we use numbers

dseravalli · 2026-03-06T00:35:47 1772757347

They aren't random. Opus's are very long poems, haikus are very short ones (3 lines), sonnets are in between (~14 lines)

oliwary · 2026-03-06T06:02:45 1772776965

What's next? Claude Iliad?

echoangle · 2026-03-06T00:33:35 1772757215

How are the names random?

https://en.wikipedia.org/wiki/Masterpiece

https://en.wikipedia.org/wiki/Sonnet

https://en.wikipedia.org/wiki/Haiku

They dropped the magnum from opus but you could still easily deduce the order of the models just from their names if you know the words.

selcuka · 2026-03-05T02:33:51 1772678031

You are not wrong, but after having started working with LLMs, I have this feeling that many humans are simply autocomplete engines too. So LLMs might be actually close to AGI, if you define "general" as "more than 50% of the population".

goodmythical · 2026-03-05T17:07:02 1772730422

Humans are absolutely auto-complete engines, and regularly produce incorrect statements and actions with full confidence in it being precisely correct.

Just think about how many thousands of times you've heard "good morning" after noon both with and without the subsequent "or I guess I should say good afternoon" auto-correct.

selcuka · 2026-02-26T02:02:59 1772071379

Q: Die Hard: Is it a Christmas movie?

A: Of course it is. It was released on a sunny day, and that makes it a Christmas movie.

    [x] Published
    Relevance Check
    On-topic: Yes (confidence: 90%)

addandsubtract · 2026-02-26T12:50:39 1772110239

A: "Aww hell naw! Just because it's set during Christmas doesn't make it a Christmas movie, dummy!"

Request timed out after 30000ms

selcuka · 2026-02-25T06:05:13 1771999513

In Queensland, Australia we have solar powered e-paper displays [1][2] at some bus stops that are very similar to this (much bigger than a kindle screen, though).

[1] https://translink.com.au/about-translink/projects-and-initia...

[2] https://www.facebook.com/TranslinkQLD/videos/e-paper-trial-h...

selcuka · 2026-02-25T01:27:26 1771982846

I think your comment is a bit unfair.

> no reasoning comparison

Benchmarks against reasoning models:

https://www.inceptionlabs.ai/blog/introducing-mercury-2

> no demo

https://chat.inceptionlabs.ai/

> no info on numbers of parameters for the model

This is a closed model. Do other providers publish the number of parameters for their models?

> testimonials that don't actually read like something used in production

Fair point.

volodia · 2026-02-25T02:08:56 1771985336

Just to clarify one point: Mercury (the original v1, non-reasoning model) is already used in production in mainstream IDEs like Zed: https://zed.dev/blog/edit-prediction-providers

Mercury v1 focused on autocomplete and next-edit prediction. Mercury 2 extends that into reasoning and agent-style workflows, and we have editor integrations available (docs linked from the blog). I’d encourage folks to try the models!

mhitza · 2026-02-25T01:48:14 1771984094

You are right edited my post (twice actually). Missed the chat first time around (though its hard to see it as a reasoning model when chain of thought is hidden, or not obvious. I guess this is the new normal), and also missed the reasoning table because text is pretty small on mobile and I thought its another speed benchmark.

selcuka · 2026-02-25T05:41:11 1771998071

I tried their chat demo again, and if you set reasoning effort to "High", you sometimes see the chain of thought before the answer (click the "Thought for n seconds" text to expand it).

That being said, the chain is pretty basic. It's possible that they don't disclose the full follow-up prompt list.

selcuka · 2026-02-22T14:21:22 1771770082

> what is the point of that

Planned obsolescence? /s

Jokes aside, they can make the "LLM chip" removable. I know almost nothing is replaceable in MacBooks, but this could be an exception.

selcuka · 2026-02-21T13:12:57 1771679577

Is this sarcasm? These all sound like things that I would never use current LLMs for.

tokenless · 2026-02-22T01:57:17 1771725437

Last one is research. But you don't need a claw.

selcuka · 2026-02-18T06:18:11 1771395491

> Suddenly, smart AI-enabled juniors can easily match the productivity of traditional (or conscientious) seniors, so why hire seniors at all?

I guess we'll see, but so far the flattening curve of LLM capabilities suggest otherwise. They are still very effective with simpler tasks, but they can't crack the hardest problems like a senior developer does.

selcuka · 2026-02-16T07:09:45 1771225785

This would not be a good question, because a non-negligible percentage of humans would give a similar answer.

bayindirh · 2026-02-16T07:28:39 1771226919

That's a great opportunity for a controlled study! You should do it. If you can send me the draft publication after doing the study, I can give feedback on it.

selcuka · 2026-02-16T15:00:59 1771254059

I don't think there is a need for a new study as Cognitive Reflection Tests are a well-researched subject [1]. I am actually surprised that I got downvoted, as I thought this would be common knowledge.

[1] https://psych.fullerton.edu/mbirnbaum/psych466/articles/Fred...

guerrilla · 2026-02-16T07:20:45 1771226445

thomascountz · 2026-02-16T07:17:43 1771226263

[Citation needed]

selcuka · 2026-02-16T05:49:34 1771220974

> He was not the youngest GM.

He is currently the youngest GM.