More

voidhorse · 2026-03-05T03:20:03 1772680803

Totally. I was just remarking today how funny it is that it was apparently ok for humans to suffer from a dearth if documentation for years, but suddenly, once the machines need it, everyone is frantic to make their tools as usable and well-documented as possible

overfeed · 2026-03-05T03:34:05 1772681645

> everyone is frantic to make their tools as usable and well-documented as possible

Eh, enjoy it while it lasts. Companies are still trying to figure out how to get value by ketting a thousand flowers blossom. The walled-garden gates will swing shut soon enough, just like they did ok the last open access revolutions (semantic web, Web 2.0, etcetera)

citizenpaul · 2026-03-05T04:29:15 1772684955

I two am wondering exactly what form slamming the gates shut in our face will take. Closing the first hit is free train And opening the doors to pay me, $#%&

voidhorse · 2026-02-26T13:56:08 1772114168

This. Most of them weren't exactly bullied.

Outside of having a military, several tech companies are probably more powerful than nation states at this point, and I think some of them realize this. As long as a complete slip into barbarism is still not fully on the table, nations need the data that tech companies have more or less entirely captured and established a complete hegemony around at this point. They also rely directly on their products. I guess the EU is starting to wake up to how problematic this is.

voidhorse · 2026-02-26T13:52:39 1772113959

I actually think being a full-time writer is a more feasible professions today than it probably was a few hundred years ago. On the other hand, back in the 1800s random newspapers would pay for serialized stories. That doesn't really happen anymore (save a few surviving exceptions like the New Yorker) but now we have substack and a ton of other avenues writers can use to keep afloat

dhosek · 2026-02-26T15:28:43 1772119723

If you read John Fante’s Ask the Dust, he has a number of dollar amounts in there for short story sales. Those numbers are better than pretty much every contemporary opportunity without adjusting for inflation. I would say that the 20s and 30s were the ideal time. Right now, it’s pretty grim for nearly all writers. Substack and other venues tend to be kind of peanut money and there are few writers who make a living from them, especially compared to the long tail of those who make nearly nothing. And most of those who earn significant money had big reputations before Substack.

voidhorse · 2026-02-24T03:57:28 1771905448

It makes the black box slightly more transparent. Knowing more in this regard allows us to be more precise—you go from prompt tweak witchcraft and divination to more of possible science and precise method.

great_psy · 2026-02-24T04:12:04 1771906324

Can this method be extended to go down to the sentence level ?

In the example it shows how much of the reason for an answer is due to data from Wikipedia. Can it drill down to show paragraph or sentence level that influences the answer ?

rickydroll · 2026-02-24T05:39:13 1771911553

Your question should be "Can it drill down to show the paragraphs or sentences that influence the answer?"

I believe that the plagiarism complaint about llm models comes from the assumption that there is a one-to-one relationship between training and answers. I think the real and delightfully messier situation is that there is a many-to-one relationship.

great_psy · 2026-02-24T06:42:35 1771915355

The example on the website shows one to many as well: Wikipedia, axive article, etc along with a ratio how much it influences the chunk of the answer.

adebayoj · 2026-02-24T09:32:32 1771925552

Exactly! We will have a future post that shows this more granularly over the coming weeks. Here is a post we wrote on how this works at smaller scale: https://www.guidelabs.ai/post/prism/

rickydroll · 2026-02-24T12:42:05 1771936925

Oh, that looks like a wonderful article. I just skimmed it, and I hope to get back to it later today. One thing I would love to see is how much of the training set is substantially similar to each other, especially in the code training set.

adebayoj · 2026-02-24T08:24:52 1771921492

Great questions. We have several posts in the works that will drill down more into these things. The model was actually designed to answer these questions for any sentence (or group of tokens it generates).

It can tell you which specific text (chunk) in the training data that led to the output the model generated. We plan to show more concrete demos of this capability over the coming weeks.

It can tell you where in the model's representation it learned about science, art, religion etc. And you can trace all of these to either to input context, training data, or model's representations.

Grimblewald · 2026-02-24T11:29:55 1771932595

Does it? If i make a system prompt for most models right now, tell them they were trained on {list} of datasets, and to attribute their answer to their training data, i get quite similar output. It even seems quite reasonable. The reason being each data corpus has a "vibe" to it and the predictions simply assign response vibe to dataset vibe.

That's still firmly in divination land.

voidhorse · 2026-02-24T03:49:59 1771904999

There are more personal practical reasons too.

Even though it cannot be reversed or eradicated (yet, let's hope) detection can allow individuals to adopt interventions that help either adjust their lives to better cope with its progression or help mitigate some of the detrimental behavioral consequences. In addition, if you have family to care for it may be impetus to get certain things in order for them before later stages of the disease, etc. It's horrible and bleak, but I could certainly see why one might want to know.

In the lucky case, it can also relieve anxiety. Even though false negatives may still be possible, receiving a negative detection might give people who have anxiety about certain symptoms relief, since they can rule out (rightly or wrongly) a pretty severe disease.

voidhorse · 2026-02-24T03:41:08 1771904468

> not really a reasoning failure

And that's precisely why the term "reasoning" was a problematic choice.

Most people, when they use the word "reason" mean something akin to logical deduction and they would call it a reasoning failure, being told, as they are, that "llms reason" rather than the more accurate picture you just painted of what actually happens (behavioral basins emerging from training dist.)

voidhorse · 2026-02-24T03:37:43 1771904263

It's actually very understandable to me that humans would make this kind of error, and we all make errors of this sort all the time, often without even realizing it. If you had the meta cognitive awareness to police every action and decision you've ever made with complete logical rigor, you'd be severely disappointed in yourself. One of the stupidest things we can do is overestimate our own intelligence. Only reflect for a second and you'll realize that, while a lot of dumb people exist, a lot of smart ones do too, and in many cases it's hard to choose a single measure of intelligence that would adequately account for the complete range of human goals and successful behavior in relation to those goals.

voidhorse · 2026-02-24T03:32:29 1771903949

You're not making a fair comparison.

"What's 2 + 2" is a completely abstract question for mathematics that human beings are thoroughly trained mostly to associate with tests of mastery and intelligence.

The car wash question is not such a question. It is framed as a question regarding a goal oriented, practical behavior, and in this situation it would be bizarre for a person to ask you this (since a rational person having all the information in the prompt, knowing what cars are, which they own, and knowing what a car wash is, wouldn't ask anybody anything, they'd just drive their car to the car wash).

And as someone else noted there are in fact situations in which it actually can be reasonable to ask for more context on what you mean by "2 + 2". You're just pointing out that human beings use a variety of social mores when interpreting messages, which is precisely why the car wash question silly/a trick were a human being to ask you and not preceded the question with a statement like "we're going to take an examine to test your logical reasoning".

As with LLMs, interpretation is all about context. The people that find this question weird (reasonably) interpret it in a practical context, not in a "this is a logic puzzle context" because human beings wags cats far more often than they subject themselves to logic puzzles.

streetfighter64 · 2026-02-24T09:18:36 1771924716

My point is that just because there's no practical reason to ask the question, that doesn't make it a weird question or make the answer anything other than obvious. You'd never ask somebody "Is the sky blue?", but that doesn't mean the answer is anything other than "Yes". The answer is clearly not "Well, is it night? Is it sunset?" etc.

voidhorse · 2026-02-24T03:24:57 1771903497

That's precisely what makes it a "trick question" or a "riddle". It's weird precisely because all the information is there. Most people who have functioning brains and complete information don't ask pointless questions (they would, obviously, just drive their car to the car wash)—there's no functional or practical reason for the communication, which is what gives it the status of a puzzle—syntax and exploitation of our tendency to assume questions are asked because information is incomplete tricks us into brining outside considerations to bear that don't matter.

voidhorse · 2026-02-10T05:16:41 1770700601

Sounds like every AI KPI I've seen. They are all just "use solution more" and none actually measure any outcome remotely meaningful or beneficial to what the business is ostensibly doing or producing.

It's part of the reason that I view much of this AI push as an effort to brute force lowering of expectations, followed by a lowering of wages, followed by a lowering of employment numbers, and ultimately the mass-scale industrialization of digital products, software included.

lucumo · 2026-02-10T06:47:00 1770706020

> Sounds like every AI KPI I've seen. They are all just "use solution more" and none actually measure any outcome remotely meaningful or beneficial to what the business is ostensibly doing or producing.

This makes more sense if you take a longer term view. A new way of doing things quite often leads to an initial reduction in output, because people are still learning how to best do things. If your only KPI is short-term output, you give up before you get the benefits. If your focus is on making sure your organization learns to use a possibly/likely productivity improving tool, putting a KPI on usage is not a bad way to go.

sarchertech · 2026-02-10T07:37:47 1770709067

We have had so many productivity improving tools/methods over the years, but I have never once seen any of them pushed on engineers from above the way AI usage has been.

I use AI frequently, but this has me convinced that the hype far exceeds reality more than anything else.

voidhorse · 2026-02-10T14:09:56 1770732596

> organization learns to use a possibly/likely productivity improving tool

But that's precisely the problem with not backing it with actual measures of meaningful outcomes. The "use more" KPIs have no way of actually discerning whether or not it has increased productivity or if the immediate gains are worth possible new risks (outages).

You don't need to run cover for a csuite class that has become both itself myopic and incredibly transparent about what they really care about (cost cutting, removing dependencies on workers who might talk back, etc.)

franktankbank · 2026-02-10T14:40:25 1770734425

Smells like kickbacks. If the company incentives don't make sense then who do they make sense for?