More

vrm · 2026-05-23T16:41:06 1779554466

It’s really not a concept you can express in idiomatic Python very easily. This comes from the actual generated assembly involving copies from global GPU memory into registers (slow, bandwidth saturates quickly) and back in between the cosines. If you can avoid the intermediate roundtrip that cuts the cost approximately in half.

vrm · 2026-05-20T21:05:05 1779311105

One question I have here: I think this type of thing would be trivial to do in Rust with constructors, private fields, and newtypes. What am I getting on top of it?

pyrex41 · 2026-05-20T21:28:25 1779312505

For a single-language Rust project with a handful of invariants, not much. Rust's newtypes + private fields + Result-returning constructors are exactly the right primitives, and they're strictly stronger than Go's (no reflection escape hatch, no zero values to forge, exhaustive matching).

Where shengen might be more interesting:

1. Multi-language emit. If your invariants live in a Rust service AND a TypeScript frontend (or another backend), one Shen spec drives both and the build catches drift. Hand-rolled means writing the same smart constructor twice in two languages with no mechanism to keep them in lockstep.

2. Spec as the audit surface. The Shen rule for `tenant-access` is going to be much more expressive / concise than the rust implementation. The `tcb-audit` gate fails the build if generated code is hand-edited away from the spec, so reviewers read the spec and the build polices the implementation against it. With hand-rolled Rust you're reviewing the impl directly — which isn't strictly bad, but it does potentially insert the human back into the loop between LLM iterations.

3. AI in the loop. If you're hand-writing the constructors carefully yourself, the argument for shengen is fairly weak. If an LLM is writing them, "declarative spec + codegen" is, in my experience, a stronger prompt than "describe the constructor in precise English." That's the frame the post is really written for; the Rust newtype point is well-taken outside it.

I'm not saying everyone writing software (w/ LLMs or not) needs shengen / shen-backpressure. The underlying principles it's trying to help you use aren't new at all. But, if you want to get more out of LLMs (esp. multi-turn loops), you probably need something that is deterministic and structured to provide that backpressure context you want to the LLM as it iterates.

vrm · 2025-08-20T14:31:08 1755700268

that is earnings (net income) not revenue (top line) so these are wildly different and incomparable numbers

kick_in_the_dor · 2025-08-20T15:19:31 1755703171

Got it - thanks for the correction.

vrm · 2025-08-08T20:37:20 1754685440

a 6:1 parameter ratio is too small for specdec to have that much of an effect. You'd really want to see 10:1 or even more for this to start to matter

lhl · 2025-08-09T08:13:32 1754727212

You're right on ratios, but actually the ratio is much worse than 6:1 since they are MoEs. The 20B has 3.6B active, and the 120B has only 5.1B active, only about 40% more!

vrm · 2025-07-10T19:33:36 1752176016

This is neat! I think in general there are really deep connections between semantically meaningful diffs (across modalities) and supervision of AI models. You might imagine a human-in-the-loop workflow where the human makes edits to a particular generation and then those edits are used as supervision for a future implementation of that thing. We did some related work here: https://www.tensorzero.com/blog/automatically-evaluating-ai-... on the coding use case but I'm interested in all the different approaches to the problem and especially on less structured domains.

vrm · 2025-06-07T20:28:22 1749328102

if you haven't check out our repo -- it's free, fully self-hosted, production-grade, and designed for precisely this application :)

https://github.com/TensorZero/tensorzero

spmurrayzzz · 2025-06-07T21:16:54 1749331014

Looks very buttoned up. My local project has some features tuned for my explicit agent flows however (built directly into my inference engine), so can't really jump ship just yet.

Looking great so far though!

vrm · 2025-06-07T14:29:20 1749306560

I definitely see different prompts based on what I'm doing in the app. As we mentioned there are different prompts for if you're asking questions, doing Cmd-K edits, working in the shell, etc. I'd also imagine that they customize the prompt by model (unobserved here, but we can also customize per-model using TensorZero and A/B test).

vrm · 2025-06-07T12:05:56 1749297956

wireshark would work for seeing the requests from the desktop app to Cursor’s servers (which make the actual LLM requests). But if you’re interested in what the actual requests to LLMs look like from Cursor’s servers you have to set something like this up. Plus, this lets us modify the request and A/B test variations!

stavros · 2025-06-07T14:03:50 1749305030

Sorry, can you explain this a bit more? Either you're putting something between your desktop to the server (in which case Wireshark would work) or you're putting something between Cursor's infrastructure and their LLM provider, in which case, how?

vrm · 2025-06-07T14:27:13 1749306433

we're doing the latter! Cursor lets you configure the OpenAI base URL so we were able to have Cursor call Ngrok -> Nginx (for auth) -> TensorZero -> LLMs. We explain in detail in the blog post.

stavros · 2025-06-07T14:38:32 1749307112

Ah OK, I saw that, but I thought that was the desktop client hitting the endpoint, not the server. Thanks!

vrm · on May 14, 2025

We're working on an OSS industrial-grade version of this at TensorZero but there's a long way to go. I think the easiest out of the box solution today is probably OpenAI RFT but that's a partial solve with substantial vendor lock-in.

vrm · on May 14, 2025

This is very neat work! Will be interested in how they make this sort of thing available to the public but it is clear from some of the results they mention that search + LLM is one path to the production of net-new knowledge from AI systems.