More

vitaelabitur · 2026-01-20T21:16:48 1768943808

This is my all-time favorite mathematics book. It approaches calculus fundamentals through a series of fictional conversations between a teacher and a student.

It is easy to read, yet I remember it gave me an odd visceral sense of what calculus is (or maybe I just thought it did).

I am reading it again.

CamperBob2 · 2026-01-21T02:10:24 1768961424

Reinforcement learning for humans. I like it!

vitaelabitur · 2026-01-19T15:13:22 1768835602

You are absolutely right — Let me know if you want to read my personal anecdote on "Dead Internet Theory"...

Yeah, I especially hate how paranoid everyone is (but rightly so). I am constantly suspicious of others' perfectly original work being AI, and others are constantly suspicious of my work being AI.

vitaelabitur · 2026-01-17T15:27:35 1768663655

We used Docusaurus.

vitaelabitur · 2026-01-17T11:43:43 1768650223

One of the authors here. I've read the paper. Brilliant work, especially the slicing implementation for denser token masks.

vitaelabitur · 2026-01-17T11:08:44 1768648124

> Increasing context length by complaining about schema errors is almost always worse from an end quality perspective than just retrying till the schema passes.

Another way to do this is to use a hybrid approach. You perform unconstrained generation first, and then constrained generation on the failures.

hansvm · 2026-01-17T18:27:45 1768674465

There's no difference in the output distribution between always doing constrained generation and only doing it on the failures though. What's the advantage?

vitaelabitur · 2026-01-17T20:01:35 1768680095

There's no advantage wrt output quality, but it can be more economical in some high-error regimes, with less LLM calls used in resampling (max 2 for most errors).

hansvm · 2026-01-17T20:12:45 1768680765

My point is that if you're capable of doing constrained generation and want to try once and the constrain on failure, since that has the same output distribution as doing constrained generation in the first place, you'd be better off just doing constrained generation always (max of 1 LLM call for the class of errors fixed by this).

There's only a different distribution with 2+ initial attempts before falling back to constrained, at least if I haven't screwed up any math.

vitaelabitur · 2026-01-08T20:55:10 1767905710

Aren't LLMs just super-powerful pattern matchers? And guessing "taps" a pattern recognition task? I am struggling to understand how your experiment relates to intelligence in any way.

Also, commercial LLMs generally have system instructions baked on top of the core models, which intrinsically prompt them to look for purpose even in random user prompts.

crooked-v · 2026-01-08T21:44:42 1767908682

There's definitely more than "just" pattern matching in there - for example, current SOTA models 'plan ahead' to simultaneously process both rough outlines of an answer and specific subject details to then combine internally for the final result (https://www.anthropic.com/research/tracing-thoughts-language...).

wood_spirit · 2026-01-08T22:55:14 1767912914

Eh that is still encompassed by the term “pattern matching” in this context. Sure it’s complicated, but it’s still just a glorified spell checker.

globnomulous · 2026-01-09T02:51:54 1767927114

I'm an LLM naysayer, and even I have no trouble seeing, or accepting, that they're much more than glorified spell checkers.

nomel · 2026-01-09T06:26:25 1767939985

And we're just glorified oxidation. At some point the concept of "emergent systems" comes into play.

lubujackson · 2026-01-09T00:24:41 1767918281

LLMs are pattern matchers, but every model is given specific instructions and response designs that influence what to do given unclear prompts. This is hugely valuable to understand since you may ask an LLM an invalid question and it is important to know if it is likely to guess at your intent, reject the prompt or respond randomly.

Understanding how LLMs fail differently is becoming more valuable than knowing that they all got 100% on some reasoning test with perfect context.

vitaelabitur · 2026-01-05T19:13:14 1767640394

I went home for holidays last month. One day, my mom had a complaint about her food delivery and raised a ticket in the app. She was assigned "someone" on chat, and she carefully typed her issue. Then, she got a call from the same "person" who asked her to explain her issue in detail. After the call, she came to me confused and frustrated. She said the "person" on the other end kept giving unrelated solutions, and signed off saying they were happy to have resolved her issue.

Of course, you know this "person" on the other end was an LLM, which I figured once she handed over her phone. I was livid, and despite having better things to do, wasted the next few hours sending a notice to the legal team. They paid a small change to shut down the issue.

Looking back, if the app had at least stated she was talking to a machine and given her an option to escalate to human support, the situation would not have deteriorated.

I feel LLMs can never be used for negative interactions like complaints, or transactional interactions like placing orders. Scope should be limited to answering factual, generic questions, like "What's my order's ETA?", etc.

vitaelabitur · 2026-01-05T14:50:59 1767624659

I might try this if you link or briefly explain the supporting research on your landing page.

vitaelabitur · 2026-01-02T15:14:06 1767366846

I tokenized these and they seem to use around 20% less tokens than the original JSONs. Which makes me think a schema like this might optimize latency and costs in constrained LLM decoding.

I know that LLMs are very familiar with JSON, and choosing uncommon schemas just to reduce tokens hurts semantic performance. But a schema that is sufficiently JSON-like probably won't disrupt model path/patterns that much and prevent unintended bias.

nurumaik · 2026-01-02T15:24:34 1767367474

Minified json would use even less tokens

vitaelabitur · 2026-01-02T15:51:13 1767369073

Yeah, but I tried switching to minified JSON on a semantic labelling task and saw a ~5% accuracy drop.

I suspect this happened because most of the pre-training corpus was pretty-printed JSON, and the LLM was forced to derail from likely path and also lost all "visual cues" of nesting depth.

This might happen here too, but maybe to a lesser extent. Anyways, I'll stop building castles in the air now and try it sometime.

memoriuaysj · 2026-01-02T17:10:34 1767373834

if you really care about structured output switch to XML. much better results, which is why all AI providers tend to use pseudo-xml in their system prompts and tool definitions

vitaelabitur · 2026-01-01T20:47:19 1767300439

I feel a lot of us are starting to reject new tech that seeks to aggressively deepen our digital immersion.

throw310822 · 2026-01-01T20:50:26 1767300626

VR/AR headsets are a super cool tech that I'd love to use but I don't dare buying, because it feels too isolating.

vitaelabitur · 2026-01-01T21:05:51 1767301551

It's ironic I've been waiting for smart glasses with displays ever since I found them in books and films as a kid, but now I see them as artifacts signaling a dystopian future. That they are tied to Meta does not help.

petersumskas · 2026-01-01T21:10:01 1767301801

Yes! I have a similar reaction.

I often think of Jack Vance’s “Eyes of the Overworld”. They hid the reality of a decaying world behind a fabulous facade.

rchaud · 2026-01-02T02:34:53 1767321293

Very easy to do when priced at $3500 and brazenly designed to deepen their own walled garden lock-in.

boogieknite · 2026-01-02T16:41:30 1767372090

tfa floated a possible shift to "wearable AI devices" which isnt as blatantly aggressive. while it seems impossible for any of these to be as immersive, their number and ubiquity seems more insidiously intrusive to me