More

freeqaz · 2026-02-25T02:51:07 1771987867

You can call Cerebras APIs via OpenRouter if you specify them as the provider in your request fyi. It's a bit pricier but it exists!

andai · 2026-02-25T03:57:37 1771991857

I used their API normally (pay per token) a few weeks ago. Their Coding Plan appears to be permanently sold out though.

freeqaz · 2026-02-17T18:17:14 1771352234

If it maintains the same price (with Anthropic tends to do or undercuts themselves) then this would be 1/3rd of the price of Opus.

Edit: Yep, same price. "Pricing remains the same as Sonnet 4.5, starting at $3/$15 per million tokens."

Bishonen88 · 2026-02-17T18:25:40 1771352740

3 is not 1/3 of 5 tho. Opus costs $5/$25

freeqaz · 2026-02-17T18:16:34 1771352194

I would honestly guess that this is just a small amount of tweaking on top of the Sonnet 4.x models. It seems like providers are rarely training new 'base' models anymore. We're at a point where the gains are more from modifying the model's architecture and doing a "post" training refinement. That's what we've been seeing for the past 12-18 months, iirc.

squidbeak · 2026-02-17T18:33:00 1771353180

> Claude Sonnet 4.6 was trained on a proprietary mix of publicly available information from the internet up to May 2025, non-public data from third parties, data provided by data-labeling services and paid contractors, data from Claude users who have opted in to have their data used for training, and data generated internally at Anthropic. Throughout the training process we used several data cleaning and filtering methods including deduplication and classification. ... After the pretraining process, Claude Sonnet 4.6 underwent substantial post-training and fine-tuning, with the intention of making it a helpful, honest, and harmless1 assistant.

phplovesong · 2026-02-18T13:43:52 1771422232

Nope. They need to update/retrain older base models regularily. Take Programming as an example, the field evolves faster than anything else.

Stuff from last year will be outdated today.

freeqaz · 2026-02-02T19:10:46 1770059446

Does anybody know when Codex is going to roll out subagent support? That has been an absolute game changer in Claude Code. It lets me run with a single session for so much longer and chip away at much more complex tasks. This was my biggest pain point when I used Codex last week.

laborcontract · 2026-02-02T19:15:22 1770059722

It's already out.

turblety · 2026-02-02T19:38:45 1770061125

Can you explain how to use it? I’ve tried asking it to do “create 3 files using multiple sub agents” and other similar wording. It never works.

Is it in the main Codex build? There doesn’t seem to be an experiment for it.

https://github.com/openai/codex/issues/2604

freeqaz · 2026-01-30T10:14:36 1769768076

I've been working on decompiling Dance Central 3 with AI and it's been insane. It's an Xbox 360 game that leverages the Kinect to track your body as your dance. It's a great game, but even with an emulator, it's still dependent on the Kinect hardware which is proprietary and has limited supply.

Fortunately, a Debug build of this game was found on a dev unit (somehow), and that build does _not_ have crazy optimizations in place (Link-time Optimization) that make this feat impossible.

I am not somebody that is deep on low level assembly, but I love this game (and Rock Band 3 which uses the same engine), and I was curious to see how far I could get by building AI tools to help with this. A project of this magnitude is ... a gargantuan task. Maybe 50k hours of human effort? Could be 100k? Hard to say.

Anyway, I've been able to make significant progress by building tools for Claude Code to use and just letting Haiku rip. Honestly, it blows me away. Here is an example that is 100% decompiled now (they compile to the exact same code as in the binary the devs shipped).

https://github.com/freeqaz/dc3-decomp/blob/test-objdiff-work...

My branch has added over 1k functions now and worked on them[0]. Some is slop, but I wrote a skill that's been able to get the code quite decent with another pass. I even implemented vmx128 (custom 360-specific CPU instructions) into Ghidra and m2c to allow it to decompile more code. Blows my mind that this is possible with just hours of effort now!

Anybody else played with this?

0: https://github.com/freeqaz/dc3-decomp/tree/test-objdiff-work...

freeqaz · 2025-12-31T23:40:00 1767224400

I assume that the author here is testing against one of these boxes, right? https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

Are these considered a good deal at $3-4k? What's the software support like on them? I've got 2x 3090s and I'm curious how this compares.

lifestyleguru · 2026-01-01T03:38:49 1767238729

> https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

In Europe these cost 5k EUR. I guess I'm not buying computer ever again and hopefully the 10 years old ones I have will never die.

wmf · 2026-01-01T01:23:19 1767230599

DGX Spark vs. Strix Halo vs. M4 Max is hotly debated. You can find plenty of HN discussions and YouTube videos about it.

freeqaz · 2025-12-05T12:09:50 1764936590

I spent way too many hours writing this all today, but I wanted to get this pushed out for others to learn from. There is a ton of detail in this notes file[0] that Claude Code helped me assemble.

If anybody has any suggestions or questions, shoot! It's 4am though so I'll be back in a bit. These CVEs are quite brutal.

0: https://github.com/freeqaz/react2shell/blob/master/EXPLOIT_N...

freeqaz · 2025-12-01T05:40:50 1764567650

Unfortunately not. It's still very broken, and next year it will be worse for a ton of people. I got AI to write a short answer for you:

> Short version: Obamacare never turned into “free primary care for everyone,” it was just a bunch of rules and subsidies bolted onto the same old private-insurance maze. It helped at the margins (more people covered, protections for pre-existing conditions), but premiums/deductibles can still go nuclear if you’re in the wrong income bracket, state, or employer situation. From an EU/Poland perspective it’s not a public health system at all, just a slightly nerfed market where you still get to roll the dice every year.

freeqaz · 2025-10-20T13:18:49 1760966329

There is also a tradeoff between different vocabulary sizes (how many entries exist in the token -> embedding lookup table) that inform the current shape of tokenizers and LLMs. (Below is my semi-armchair stance, but you can read more in depth here[0][1].)

If you tokenized at the character level ('a' -> embedding) then your vocabulary size would be small, but you'd have more tokens required to represent most content. (And context scales non-linearly, iirc, like n^3) This would also be a bit more 'fuzzy' in terms of teaching the LLM to understand what a specific token should 'mean'. The letter 'a' appears in a _lot_ of different words, and it's more ambiguous for the LLM.

On the flip side: What if you had one entry in the tokenizer's vocabulary for each word that existed? Well, it'd be far more than the ~100k entries used by popular LLMs, and that has some computational tradeoffs like when you calculate the probability of each 'next' token via softmax, you'd have to run that for each token, as well as increasing the size of certain layers within the LLM (more memory + compute required for each token, basically).

Additionally, you run into a new problem: 'Rare Tokens'. Basically, if you have infinite tokens, you'll run into specific tokens that only appear a handful of times in the training data and the model is never able to fully imbue the tokens with enough meaning for them to _help_ the model during inference. (A specific example being somebody's username on the internet.)

Fun fact: These rare tokens, often called 'Glitch Tokens'[2], have been used for all sorts of shenanigans[3] as humans learn to break these models. (This is my interest in this as somebody who works in AI security)

As LLMs have improved, models have pushed towards the largest vocabulary they can get away with without hurting performance. This is about where my knowledge on the subject ends, but there have been many analyses done to try to compute the optimal vocabulary size. (See the links below)

One area that I have been spending a lot of time thinking about is what Tokenization looks like if we start trying to represent 'higher order' concepts without using human vocabulary for them. One example being: Tokenizing on LLVM bytecode (to represent code more 'densely' than UTF-8) or directly against the final layers of state in a small LLM (trying to use a small LLM to 'grok' the meaning and hoist it into a more dense, almost compressed latent space that the large LLM can understand).

It would be cool if Claude Code, when it's talking to the big, non-local model, was able to make an MCP call to a model running on your laptop to say 'hey, go through all of the code and give me the general vibe of each file, then append those tokens to the conversation'. It'd be a lot fewer tokens than just directly uploading all of the code, and it _feels_ like it would be better than uploading chunks of code based on regex like it does today...

This immediately makes the model's inner state (even more) opaque to outside analysis though. e.g., like why using gRPC as the protocol for your JavaScript front-end sucks: Humans can't debug it anymore without other tooling. JSON is verbose as hell, but it's simple and I can debug my REST API with just network inspector. I don't need access to the underlying Protobuf files to understand what each byte means in my gRPC messages. That's a nice property to have when reviewing my ChatGPT logs too :P

Exciting times!

0: https://www.rohan-paul.com/p/tutorial-balancing-vocabulary-s...

1: https://arxiv.org/html/2407.13623v1

2: https://en.wikipedia.org/wiki/Glitch_token

3: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldm...

rco8786 · 2025-10-20T14:33:01 1760970781

Again, super interesting thanks!

> One area that I have been spending a lot of time thinking about is what Tokenization looks like if we start trying to represent 'higher order' concepts without using human vocabulary for them. One example being: Tokenizing on LLVM bytecode (to represent code more 'densely' than UTF-8)

I've had similar ideas in the past. High level languages that humans write are designed for humans. What does an "LLM native" programming language look like? And, to your point about protobufs vs JSON, how does a human debug it when the LLM gets stuck?

> It would be cool if Claude Code, when it's talking to the big, non-local model, was able to make an MCP call to a model running on your laptop to say 'hey, go through all of the code and give me the general vibe of each file, then append those tokens to the conversation'. It'd be a lot fewer tokens than just directly uploading all of the code, and it _feels_ like it would be better than uploading chunks of code based on regex like it does today...

That's basically the strategy for Claude's new "Skills" feature, just in a more dynamic/AI driven way. Claude will do semantic search through YAML frontmatter to determine what skill might be useful in a given context, then load that entire skill file into context to execute it. Your idea here is similar, use a small local model to summarize each file (basically dynamically generate that YAML front matter), feed those into the larger model's context, and then it can choose which file(s) it cares about based on that.

freeqaz · 2025-10-20T10:48:53 1760957333

Since I'm 5+ years out from my NDA around this stuff, I'll give some high level details here.

Snapchat heavily used Google AppEngine to scale. This was basically a magical Java runtime that would 'hot path split' the monolithic service into lambda-like worker pools. Pretty crazy, but it worked well.

Snapchat leaned very heavily on this though and basically let Google build the tech that allowed them to scale up instead of dealing with that problem internally. At one point, Snap was >70% of all GCP usage. And this was almost all concentrated on ONE Java service. Nuts stuff.

Anyway, eventually Google was no longer happy with supporting this and the corporate way of breaking up is "hey we're gonna charge you 10x what did last year for this, kay?" (I don't know if it was actually 10x. It was just a LOT more)

So began the migration towards Kubernetes and AWS EKS. Snap was one of the pilot customers for EKS before it was generally available, iirc. (I helped work on this migration in 2018/2019)

Now, 6+ years later, I don't think Snap heavily uses GCP for traffic unless they migrated back. And this outage basically confirms that :P

garbthetill · 2025-10-20T11:39:57 1760960397

Thats so interesting to me, I always assume companies like google who have "unlimited" dollars will always be happy to eat the cost to keep customers, especially given gcp usage outside googles internal services is way smaller compared to azure and aws. Also interesting to see snapchat had a hacky solution with AppEngine

freeqaz · 2025-10-20T13:33:13 1760967193

These are the best additional bits of information that I can find to share with you if you're curious to read more about Snap and what they did. (They were spending $400m per year on GCP which was famously disclosed in their S-1 when they IPO'd)

0: https://chrpopov.medium.com/scaling-cloud-infrastructure-5c6...

1: https://eng.snap.com/monolith-to-multicloud-microservices-sn...

makeitdouble · 2025-10-20T11:49:02 1760960942

The "unlimited dollars" come from somewhere after all.

GCP is behind in market share, but has the incredible cheat advantage of just not being Amazon. Most retailers won't touch Amazon services with a ten foot pole, so the choice is GCP or Azure. Azure is way more painful for FOSS stacks, so GCP has its own area with only limited competition.

Scubabear68 · 2025-10-20T13:05:13 1760965513

I’m not sure what you mean by Azure being more painful for FOSS stacks. That is not my experience. Old you elaborate?

However I have seen many people flee from GCP because: Google lacks customer focus, Google is free about killing services, Google seems to not care about external users, people plain don’t trust Google with their code, data or reputation.

dzonga · 2025-10-20T12:25:41 1760963141

Customers would rather choose Azure. GCP has a bad rep, bad documentation, bad support compared to AWS / Azure. & with google cutting off products, their trust is damaged.

ecshafer · 2025-10-20T12:03:47 1760961827

GCP as I understand it is the E-commerce/retail choice for this reason. Not Amazon being the main reason.

Honestly as a (very small) shareholder in Amazon, they should spin off AWS as a separate company. The Amazon brand is holding AWS back.

philistine · 2025-10-20T13:45:39 1760967939

Absolutely! AWS is worth more as a separate company than being hobbled by the rest of Amazon. YouTube is the same.

Big monopolists do not unlock more stock market value, they hoard it and stifle it.

array_key_first · 2025-10-20T12:34:25 1760963665

Google does not give even a singular fuck about keeping their customers. They will happily kill products that are actively in use and are low-effort for... convenience? Streamlining? I don't know, but Google loves to do that.

throwway120385 · 2025-10-20T14:47:29 1760971649

The engineering manager that was leading the project got promoted and now no longer cares about it.

lesuorac · 2025-10-20T15:18:53 1760973533

High margin companies are always looking to cut the lower-margin parts of their business regardless of if they're profitable.

The general idea being that you'll losing money due to opportunity cost.

Personally, I think you're better off just not laying people off and having them work the less (but still) profitable stuff. But I'm not in charge.