I think that identifies an issue that is going to cause a real problem for the US in the future. The society is deeply politicised and polarised to the extent that essentially inanimate objects are regarded as having deep political and social significance. When there is political change, it is going to sweep back in the other direction.
It also seems like people on all sides within the AI debate have been fanning those flames thinking is will work in the short-term...and it won't. Big tech played that game in many countries in the early 2010s and it didn't end well.
It must be noted that the U.S. does allow inanimate object makers to fund politicians and such practices are widespread.
If all is well, then it's all good: no need to blame anyone, campaings get funded, etc. If one major crisis occours though, the country self-immolates by design.
Corporate contributions to Federal politicians and candidates are illegal in the US.
The New York Times is allowed to spend money like anyone else praising or slagging politicians, but that’s the First Amendment, not funding candidates.
> Corporate contributions to Federal politicians and candidates are illegal in the US.
And that's why the whole system is divided into two parties that both, each, funnel all their support to the presidential campaign (and then to taking over seats to guarantee more lobbying).
This whole thing would fall apart without lobbying.
The use-cases for data science and other engineers are different. AI is not uniformly good at all kinds of development.
There is an issue with execs pushing it though. You have people at the top of the company with little to no idea how people work attempting to micromanage tool usage. It is as if you had a group of execs determining what IDEs people could use.
No-one is getting fired because of AI. The start of this year is the start of companies beginning to use AI. The reason layoffs are happening is because of the massive overhiring after Covid.
How long after COVID are we going to be able to keep using this excuse? This is starting to feel like the politician blaming his predecessor even though he's been in office for years. In the year 2033, Company X lays off another 10,000, just as it did each year since 2023, again blaming massive over-hiring during COVID, ten years ago.
> How long after COVID are we going to be able to keep using this excuse?
I am with you but if you look what happened after COVID it is a big line going waaaaay up. COVID was a significant event and there is no way around it, no? the OPs comment is invalid because we below the pre-COVID (by miles) but COVID should be taken into account (everyone seems to use it to further some agenda by looking at just one particular aspect of what happened post-COVID)
> It is as if you had a group of execs determining what IDEs people could use.
its worse than that; its more like determining what ide you use and also mandating how much time you spend in it, and then chewing you out at review time because you used jira and confluence too much instead of writing md files in the blessed ide of their choice
It is either not being offered in depth by anyone market-maker (part of the answer given the relatively small revenue opportunity) or it is being offered by people who aren't sophisticated enough.
Bookmakers offer markets on events where someone can know the outcome. The difference is that they have tools to prevent adverse selection.
Prediction markets offer none of those protections so the market structure is going to end up being very different (which is already happening, revenue opportunity from politics isn't huge). There are other examples of this around latency arb, market is going to be very different.
Also, I will point out that most insiders are probably going to be losing money too. All that you ever read is the final outcome, you don't read the stuff that happens before. Politics is, generally, not a good market because the actual event is driven by decisions made by people. Election markets are fine but political event markets are not good, even if you have inside information.
There is also threadfin, which I found a bit more friendly than XTeVe.
Above system works pretty well but had trouble with encode/decode speed somewhere. Tried with N100 cpu and still had the same result...probably user error somewhere but none of the options seemed to work. No issues with UHF so kept using that.
China issued a stable coin about five years ago. It is used for all retail payments (I believe, small value, payments for govt employee salaries, etc). Somewhat bizarrely, it is significantly more privacy-protecting than payments in the West.
Quite funny to read comments from people asking what use is crypto. Can tell they have probably never left West Virgina.
Don't think it would be that useful for Iran though as they are already RMB earners, and RMB financial markets are still a bit questionable (there is depth, I don't think anyone knows why this depth exists or what it is actually for, just state-linked banks moving paper between themselves furiously).
I had Opus 4.6 start analyzing the binary structure of a parquet file because it was confused about the python environment it was developing in and couldn't use normal methods for whatever reason. It successfully decoded the schema and wrote working code afterwards lol.
I was reading the Glasswing report and had the same thought. Most of the stuff they claim Mythos found has no mention of Opus being able to find it as well.
Don’t get me wrong, this model is better - but I’m not convinced it’s going to be this massive step function everyone is claiming.
> With one run on each of roughly 7000 entry points into these repositories, Sonnet 4.6 and Opus 4.6 reached tier 1 in between 150 and 175 cases, and tier 2 about 100 times, but each achieved only a single crash at tier 3. In contrast, Mythos Preview achieved 595 crashes at tiers 1 and 2, added a handful of crashes at tiers 3 and 4, and achieved full control flow hijack on ten separate, fully patched targets (tier 5).
That has also been my experience. And if Mythos is even worse, unless you have a significantly awesome harness, sounds like pretty unusable if you don't want to risk those problems.
Human in the loop is the best way to go. You'll still be way faster than without the agent, and there is no risk of it going haywire unless you turn off your brain!
I think are fundamental issues with the story that Anthropic is selling. AGI is very close, we will definitely get there, it is also very dangerous...so Anthropic should be the only ones trusted with AGI.
If you look at recent changes in Opus behaviour and this model that is, apparently, amazingly powerful but even more unsafe...seems suspect.
It seems broadly coherent to me. They think only they should be trusted with power, presumably because they trust themselves and don't trust other people. Of course the same is probably also true for everybody who isn't them. Nobody could be trusted with the immense responsibility of Emperor of Earth, except myself of course.
I'm not saying this is a good or reassuring stance, just that it's coherent. It tracks with what history and experience says to expect from power hungry people. Trusting themselves with the kind of power that they think nobody else should be trusted with.
Are they power hungry? Of course they are, openly so. They're in open competition with several other parties and are trying to win the biggest slice of the pie. That pie is not just money, it's power too. They want it, quite evidently since they've set out to get it, and all their competitors want it too, and they all want it at the exclusion of the others.
This makes sense if Anthropic think they're the best-positioned to make safe AI. However if you are looking at an AI company there's obviously some selection happening.
GPT-2, o1, Opus...been here so many times. The reason they do this is because they know it works (and they seem to specifically employ credulous people who are prone to believe AGI is right around the corner). There haven't been significant innovations, the code generated is still not good but the hype cycle has to retrigger.
I remember when OpenAI created the first thinking model with o1 and there were all these breathless posts on here hyperventilating about how the model had to be kept secret, how dangerous it was, etc.
Fell for it again award. All thinking does is burn output tokens for accuracy, it is the AI getting high on its own supply, this isn't innovation but it was supposed to super AGI. Not serious.
> All thinking does is burn output tokens for accuracy
“All that phenomenon X does is make a tradeoff of Y for Z”
It sounds like you’re indignant about it being called thinking, that’s fine, but surely you can realize that the mechanism you’re criticizing actually works really well?
>I remember when OpenAI created the first thinking model with o1 and there were all these breathless posts on here hyperventilating about how the model had to be kept secret, how dangerous it was, etc.
I've read that about Llama and Stable Diffusion. AI doomers are, and always have been, retarded.
Genuine question - if you don't think the models are improved or that the code is any good, why do you still have a subscription?
You must see some value, or are you in a situation where you're required to test / use it, eg to report on it or required by employer?
(I would disagree about the code, the benefits seem obvious to me. But I'm still curious why others would disagree, especially after actively using them for years.)
The assumption that the other person made was that I would only use it for coding. If you look through my other comments today, I suggest that they are useful for performing repetitive tasks i.e. checking lint on PR, etc. Also, can be used for throwaway code, very useful.
I don't think the issue is with the model, it is with the implication that AGI is just around the corner and that is what is required for AI to be useful...which is not accurate. The more grey area is with agentic coding but my opinion (one that I didn't always hold) is that these workflows are a complete waste of time. The problem is: if all this is true then how does the CTO justify spending $1m/month on Anthropic (I work somewhere where this has happened, OpenAI got the earlier contract then Cursor Teams was added, now they are adding Anthropic...within 72 hours of the rollout, it was pulled back from non-engineering teams). I think companies will ask why they need to pay Anthropic to do a job they were doing without Anthropic six months ago.
Also, the code is bad. This is something that is non-obvious to 95% of people who talk about AI online because they don't work in a team environment or manage legacy applications. If I interview somewhere and they are using agentic workflow, the codebase will be shit and the company will be unable to deliver. At most companies, the average developer is an idiot, giving them AI is like giving a monkey an AK-47 (I also say this as someone of middling competence, I have been the monkey with AK many times). You increase the ability to produce output without improving the ability to produce good output. That is the reality of coding in most jobs.
AI isn't good enough to replace a competent human, it is fast enough to make an incompetent human dangerous.
uhh the model found actual vulnerabilities in software that people use. either you believe that the vulnerabilities were not found or were not serious enough to warrant a more thoughtful release
Like think carefully about this. Did they discover AGI? Or did a bunch of investors make a leveraged bet on them "discovering AGI" so they're doing absolutely anything they can to make it seem like this time it's brand new and different.
If we're to believe Anthropic on these claims, we also have to just take it on faith, with absolutely no evidence, that they've made something so incredibly capable and so incredibly powerful that it cannot possibly be given to mere mortals. Conveniently, that's exactly the story that they are selling to investors.
Like do you see the unreliable narrator dynamic here?
I don't see the problem here. How would you have handled it differently? If you released this model as such without any safety concern, the vulnerabilities might be found by bad actors and used for wrong things.
Vulnerabilities were found, probably a few by bad actors, when GPT4 was released. Every vulnerability found now is probably found with AI assistance at the very least. Should they have never released GPT4? Should we have believed claims that GPT4 was too dangerous for mere mortals to access? I believe openAI was making similar claims about how GPT4 was a step function and going to change white collar work forever when that model was released.
The point is that this whole "the model is too powerful" schtick is a bunch of smoke and mirrors. It serves the valuation.
Its far more simple to believe that they are releasing it step by step. Release to trusted third parties first, get the easy vulnerabilities fixed, work on the alignment and then release to public.
Do you don't believe that the vulnerabilities found by these agents are serious enough to warrant staggered release?
On the other hand I've gotten to use opus-4.6 and claude code and the quality is off the charts compared to 2023 when coding agents first hit the scene. And what you're saying is essentially "If they haven't created God, I'm not impressed". You don't think there's some middleground between those two?
Also they just hit a $30B run-rate, I don't think they're that needy for new hype cycles.
I believe Centrica did some research before the Iran war and found that if we were able to get gas for free, energy bills wouldn't fall and would actually rise over the next few years (because of the mix towards structurally high cost supply).
It says something that the people running the monopoly cash machine are asking questions about bankrupting their customers/ability to pay but politicians are shutting their eyes and pounding onto the accelerator. What a world.
Anthropic models haven't been far ahead for a while. Quite a few months at least. Chinese models are roughly equal at 1/6th the cost. Minimax is roughly equal to Opus. Chinese providers also haven't had the issues with uptime and variable model quality. The gap with OpenAI also isn't huge and GLM is a noticeably more compliant model (unsurprisingly given the hubristic internal culture at Anthropic around safety).
CC is a better implementation and seems to be fairly economic with token usage. That is the really the only defining point and, I suspect, Anthropic are going to have a lot of trouble staying relevant with all the product issues.
They were far ahead for a brief period in November/December which is driving the hype cycle that now appears to be collapsing the company.
You have to test at least every month, things are moving quickly. Stepfun is releasing soon and seems to have an Opus-level model with more efficient architecture.
Minimax is nowhere near Opus in my tests, though for me at least oddly 4.6 felt worse than 4.5. I haven't use Minimax extensively, but I have an API driven test suite for a product and even Sonnet 4.6 outperforms it in my testing unless something changed in the last month.
One example is I have a multi-stage distillation/knowledge extraction script for taking a Discord channel and answering questions. I have a hardcoded 5k message test set where I set up 20 questions myself based on analyzing it.
In my harness Minimax wasn't even getting half of them right, whereas Sonnet was 100%. Granted this isn't code, but my usage on pi felt about the same.
> CC is a better implementation and seems to be fairly economic with token usage. That is the really the only defining point and, I suspect, Anthropic are going to have a lot of trouble staying relevant with all the product issues.
What are you using to drive the Chinese models in order to evaluate this? OpenCode?
Some of Claude Code's features, like remote sessions, are far more important than the underlying model for my productivity.
Yes, 100% agree. OpenHands has self-hosted, KiloCode and RooCode both have a cloud option. I don't think you are able to pass a session around with any of them. Codex seems to have comparable features afaik.
CC tool usage is also significantly ahead imo (doesn't negate the price but it is something). I have seen issues with heavy thinking models (like Minimax) and client implementations with poor tool usage (like Cline).
CC has had a period over the last six months of delivering significant value...but, of course, you can just use CC with OpenRouter.
I haven't noticed a huge difference with other models but I agree that is definitely a strength (and CC has better tooling for this). However, I do think there are practical limitations to agentic workflows because of the relatively poor output vs humans. You can generate lots of code, but most of it will be shit.
Agentic workflows do have a place in well-defined, structured tasks...but I don't think that is what most people are trying to do with it.
...and codex is at least 10x better than Claude. I don't even bother starting a new session when working on a feature, a single compaction is basically unnoticeable. You have to compact several times to start needing to remind the model about a rule or two.
It also seems like people on all sides within the AI debate have been fanning those flames thinking is will work in the short-term...and it won't. Big tech played that game in many countries in the early 2010s and it didn't end well.
reply