Hacker Newsnew | past | comments | ask | show | jobs | submit | selcuka's commentslogin

With Anthropic you always have 3 models to choose from: Opus-latest, Sonnet-latest, and Haiku-latest, from the best/slowest to the worst/fastest.

The version numbers are mostly irrelevant as afaik price per token doesn't change between versions.


Three random names isn't ideal. I'm often need to double check which is which. This is why we use numbers

They aren't random. Opus's are very long poems, haikus are very short ones (3 lines), sonnets are in between (~14 lines)

What's next? Claude Iliad?

How are the names random?

https://en.wikipedia.org/wiki/Masterpiece

https://en.wikipedia.org/wiki/Sonnet

https://en.wikipedia.org/wiki/Haiku

They dropped the magnum from opus but you could still easily deduce the order of the models just from their names if you know the words.


You are not wrong, but after having started working with LLMs, I have this feeling that many humans are simply autocomplete engines too. So LLMs might be actually close to AGI, if you define "general" as "more than 50% of the population".

Humans are absolutely auto-complete engines, and regularly produce incorrect statements and actions with full confidence in it being precisely correct.

Just think about how many thousands of times you've heard "good morning" after noon both with and without the subsequent "or I guess I should say good afternoon" auto-correct.


Q: Die Hard: Is it a Christmas movie?

A: Of course it is. It was released on a sunny day, and that makes it a Christmas movie.

    [x] Published
    Relevance Check
    On-topic: Yes (confidence: 90%)

A: "Aww hell naw! Just because it's set during Christmas doesn't make it a Christmas movie, dummy!"

Request timed out after 30000ms


In Queensland, Australia we have solar powered e-paper displays [1][2] at some bus stops that are very similar to this (much bigger than a kindle screen, though).

[1] https://translink.com.au/about-translink/projects-and-initia...

[2] https://www.facebook.com/TranslinkQLD/videos/e-paper-trial-h...


I think your comment is a bit unfair.

> no reasoning comparison

Benchmarks against reasoning models:

https://www.inceptionlabs.ai/blog/introducing-mercury-2

> no demo

https://chat.inceptionlabs.ai/

> no info on numbers of parameters for the model

This is a closed model. Do other providers publish the number of parameters for their models?

> testimonials that don't actually read like something used in production

Fair point.


Just to clarify one point: Mercury (the original v1, non-reasoning model) is already used in production in mainstream IDEs like Zed: https://zed.dev/blog/edit-prediction-providers

Mercury v1 focused on autocomplete and next-edit prediction. Mercury 2 extends that into reasoning and agent-style workflows, and we have editor integrations available (docs linked from the blog). I’d encourage folks to try the models!


You are right edited my post (twice actually). Missed the chat first time around (though its hard to see it as a reasoning model when chain of thought is hidden, or not obvious. I guess this is the new normal), and also missed the reasoning table because text is pretty small on mobile and I thought its another speed benchmark.

I tried their chat demo again, and if you set reasoning effort to "High", you sometimes see the chain of thought before the answer (click the "Thought for n seconds" text to expand it).

That being said, the chain is pretty basic. It's possible that they don't disclose the full follow-up prompt list.


> what is the point of that

Planned obsolescence? /s

Jokes aside, they can make the "LLM chip" removable. I know almost nothing is replaceable in MacBooks, but this could be an exception.


Is this sarcasm? These all sound like things that I would never use current LLMs for.

Last one is research. But you don't need a claw.

> Suddenly, smart AI-enabled juniors can easily match the productivity of traditional (or conscientious) seniors, so why hire seniors at all?

I guess we'll see, but so far the flattening curve of LLM capabilities suggest otherwise. They are still very effective with simpler tasks, but they can't crack the hardest problems like a senior developer does.


This would not be a good question, because a non-negligible percentage of humans would give a similar answer.


That's a great opportunity for a controlled study! You should do it. If you can send me the draft publication after doing the study, I can give feedback on it.


I don't think there is a need for a new study as Cognitive Reflection Tests are a well-researched subject [1]. I am actually surprised that I got downvoted, as I thought this would be common knowledge.

[1] https://psych.fullerton.edu/mbirnbaum/psych466/articles/Fred...


No.


[Citation needed]


> He was not the youngest GM.

He is currently the youngest GM.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: