By "reasoning" I meant the fact that o*(-mini) does "chain-of-thought", in other words, it prompts itself to "reason" before responding to you, whereas GPT-4o(-mini) just directly responds to your prompt. Thus, it is not appropriate to compare o*(-mini) and GPT-4o(-mini) unless you implement "chain-of-thought" for GPT-4o(-mini) and compare that with o*(-mini). See also: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...
Yes you use the models for the same things, and one is better than the other for said thing. The reasoning process is an implementation detail that does not concern anybody when evaluating the models, esp since "open"ai does not expose it. I just want llms to to task X which is usually "write a function in Y language that does W, taking these Z stuff into account", and for that i have found no reason to switch away from sonnet yet.
Yes, but because you need to say exactly what one is better than the other for. Not because o1 spends a bunch of tokens for "reasoning" you cannot even see.
If you would like to see the CoT process visualized, try the “Improve prompt” feature in Anthropic console. Also check out https://github.com/getAsterisk/deepclaude
o-whatever are doing the same thing as any LLM, it's merely that they've been tuned into using a chain of thought to break out of their complexity class (from pattern matching TC0 to pseudo-UTM). But any foundation model with a bit of instruction tuning is going to be able to do this.