That's just in reference to the technique itself. They're basically saying it's okay for Google to use distillation to train Gemini N Flash using Gemini N-1 Pro (which they do).
As someone with a 5:38 delta, I'm very anxiously waiting for BAA to announce the official cutoff.
In the meantime, if you're at all curious about the kinds of levels to which people go with trying to predict the cutoff check out this blog[1]. This is from Brian Rock [2], who every year collects data about a lot of marathons all over the world and then tries to guess the official cutoff for the Boston marathon. Very cool stuff!
I wonder if it was geolocation? Anthropic is based in SF, the author seems to be based in Munich, and maybe they're not open to hiring people who aren't based in the US right now? Given the state of US visas right now, this wouldn't shock me.
My company, which is significantly smaller, hires people in multiple countries across the world. You don't need an office to hire (I am sure there so exist countries where you do, but I expect they are the minority).
These are multiple assumptions
"This queue is only on one machine and on one thread", what's the real world use-case here? Not saying there's none but make it clear. I wouldn't want to work for a company that has to think of some random precise question instead of e.g. "when would you not use mysql?"
We have a similar wrapper for local LLMs on the roadmap.
If you use CLI only - we run claude 4 + gemini on the backend, gemini serving most of the vision tasks (frontend validation) and claude doing core codegen.
We use both Claude 4 and Gemini by default (for different tasks). But the idea is you can self-host this and use other models (and even BYOM - bring your own models).
And we also blogged[1] about how the whole thing works. We're very excited about getting this out but we have a ton of improvements we'd like to make still. Please let us know if you have any questions!
reply