Totally. I was just remarking today how funny it is that it was apparently ok for humans to suffer from a dearth if documentation for years, but suddenly, once the machines need it, everyone is frantic to make their tools as usable and well-documented as possible
> everyone is frantic to make their tools as usable and well-documented as possible
Eh, enjoy it while it lasts. Companies are still trying to figure out how to get value by ketting a thousand flowers blossom. The walled-garden gates will swing shut soon enough, just like they did ok the last open access revolutions (semantic web, Web 2.0, etcetera)
I two am wondering exactly what form slamming the gates shut in our face will take. Closing the first hit is free train And opening the doors to pay me, $#%&
Outside of having a military, several tech companies are probably more powerful than nation states at this point, and I think some of them realize this. As long as a complete slip into barbarism is still not fully on the table, nations need the data that tech companies have more or less entirely captured and established a complete hegemony around at this point. They also rely directly on their products. I guess the EU is starting to wake up to how problematic this is.
I actually think being a full-time writer is a more feasible professions today than it probably was a few hundred years ago. On the other hand, back in the 1800s random newspapers would pay for serialized stories. That doesn't really happen anymore (save a few surviving exceptions like the New Yorker) but now we have substack and a ton of other avenues writers can use to keep afloat
If you read John Fante’s Ask the Dust, he has a number of dollar amounts in there for short story sales. Those numbers are better than pretty much every contemporary opportunity without adjusting for inflation. I would say that the 20s and 30s were the ideal time. Right now, it’s pretty grim for nearly all writers. Substack and other venues tend to be kind of peanut money and there are few writers who make a living from them, especially compared to the long tail of those who make nearly nothing. And most of those who earn significant money had big reputations before Substack.
It makes the black box slightly more transparent. Knowing more in this regard allows us to be more precise—you go from prompt tweak witchcraft and divination to more of possible science and precise method.
Can this method be extended to go down to the sentence level ?
In the example it shows how much of the reason for an answer is due to data from Wikipedia. Can it drill down to show paragraph or sentence level that influences the answer ?
Your question should be "Can it drill down to show the paragraphs or sentences that influence the answer?"
I believe that the plagiarism complaint about llm models comes from the assumption that there is a one-to-one relationship between training and answers. I think the real and delightfully messier situation is that there is a many-to-one relationship.
Exactly! We will have a future post that shows this more granularly over the coming weeks. Here is a post we wrote on how this works at smaller scale: https://www.guidelabs.ai/post/prism/
Oh, that looks like a wonderful article. I just skimmed it, and I hope to get back to it later today. One thing I would love to see is how much of the training set is substantially similar to each other, especially in the code training set.
Great questions. We have several posts in the works that will drill down more into these things. The model was actually designed to answer these questions for any sentence (or group of tokens it generates).
It can tell you which specific text (chunk) in the training data that led to the output the model generated. We plan to show more concrete demos of this capability over the coming weeks.
It can tell you where in the model's representation it learned about science, art, religion etc. And you can trace all of these to either to input context, training data, or model's representations.
Does it? If i make a system prompt for most models right now, tell them they were trained on {list} of datasets, and to attribute their answer to their training data, i get quite similar output. It even seems quite reasonable. The reason being each data corpus has a "vibe" to it and the predictions simply assign response vibe to dataset vibe.
Even though it cannot be reversed or eradicated (yet, let's hope) detection can allow individuals to adopt interventions that help either adjust their lives to better cope with its progression or help mitigate some of the detrimental behavioral consequences. In addition, if you have family to care for it may be impetus to get certain things in order for them before later stages of the disease, etc. It's horrible and bleak, but I could certainly see why one might want to know.
In the lucky case, it can also relieve anxiety. Even though false negatives may still be possible, receiving a negative detection might give people who have anxiety about certain symptoms relief, since they can rule out (rightly or wrongly) a pretty severe disease.
And that's precisely why the term "reasoning" was a problematic choice.
Most people, when they use the word "reason" mean something akin to logical deduction and they would call it a reasoning failure, being told, as they are, that "llms reason" rather than the more accurate picture you just painted of what actually happens (behavioral basins emerging from training dist.)
It's actually very understandable to me that humans would make this kind of error, and we all make errors of this sort all the time, often without even realizing it. If you had the meta cognitive awareness to police every action and decision you've ever made with complete logical rigor, you'd be severely disappointed in yourself. One of the stupidest things we can do is overestimate our own intelligence. Only reflect for a second and you'll realize that, while a lot of dumb people exist, a lot of smart ones do too, and in many cases it's hard to choose a single measure of intelligence that would adequately account for the complete range of human goals and successful behavior in relation to those goals.
"What's 2 + 2" is a completely abstract question for mathematics that human beings are thoroughly trained mostly to associate with tests of mastery and intelligence.
The car wash question is not such a question. It is framed as a question regarding a goal oriented, practical behavior, and in this situation it would be bizarre for a person to ask you this (since a rational person having all the information in the prompt, knowing what cars are, which they own, and knowing what a car wash is, wouldn't ask anybody anything, they'd just drive their car to the car wash).
And as someone else noted there are in fact situations in which it actually can be reasonable to ask for more context on what you mean by "2 + 2". You're just pointing out that human beings use a variety of social mores when interpreting messages, which is precisely why the car wash question silly/a trick were a human being to ask you and not preceded the question with a statement like "we're going to take an examine to test your logical reasoning".
As with LLMs, interpretation is all about context. The people that find this question weird (reasonably) interpret it in a practical context, not in a "this is a logic puzzle context" because human beings wags cats far more often than they subject themselves to logic puzzles.
My point is that just because there's no practical reason to ask the question, that doesn't make it a weird question or make the answer anything other than obvious. You'd never ask somebody "Is the sky blue?", but that doesn't mean the answer is anything other than "Yes". The answer is clearly not "Well, is it night? Is it sunset?" etc.
That's precisely what makes it a "trick question" or a "riddle". It's weird precisely because all the information is there. Most people who have functioning brains and complete information don't ask pointless questions (they would, obviously, just drive their car to the car wash)—there's no functional or practical reason for the communication, which is what gives it the status of a puzzle—syntax and exploitation of our tendency to assume questions are asked because information is incomplete tricks us into brining outside considerations to bear that don't matter.
Sounds like every AI KPI I've seen. They are all just "use solution more" and none actually measure any outcome remotely meaningful or beneficial to what the business is ostensibly doing or producing.
It's part of the reason that I view much of this AI push as an effort to brute force lowering of expectations, followed by a lowering of wages, followed by a lowering of employment numbers, and ultimately the mass-scale industrialization of digital products, software included.
> Sounds like every AI KPI I've seen. They are all just "use solution more" and none actually measure any outcome remotely meaningful or beneficial to what the business is ostensibly doing or producing.
This makes more sense if you take a longer term view. A new way of doing things quite often leads to an initial reduction in output, because people are still learning how to best do things. If your only KPI is short-term output, you give up before you get the benefits. If your focus is on making sure your organization learns to use a possibly/likely productivity improving tool, putting a KPI on usage is not a bad way to go.
We have had so many productivity improving tools/methods over the years, but I have never once seen any of them pushed on engineers from above the way AI usage has been.
I use AI frequently, but this has me convinced that the hype far exceeds reality more than anything else.
> organization learns to use a possibly/likely productivity improving tool
But that's precisely the problem with not backing it with actual measures of meaningful outcomes. The "use more" KPIs have no way of actually discerning whether or not it has increased productivity or if the immediate gains are worth possible new risks (outages).
You don't need to run cover for a csuite class that has become both itself myopic and incredibly transparent about what they really care about (cost cutting, removing dependencies on workers who might talk back, etc.)
reply