That is just false. The model had a disclaimer on every page that it is not to be trusted and that it hallucinates. Strawmanning is not a nice academic retort
I'm saying that if your model needs something like that (I certainly didn't see the disclaimer) then it's not appropriate for science. THe expectation of accuracy is much higher.
I disagree. How can you say the expectation of accuracy is much higher when the creators literally tell you to not expect accuracy? That's a problem with your expectations, not the programs problem.
Imagine a program that had some kind of concept of "interesting and novel mathematical proofs", and it could spit them out. But, 99.9% of the proofs were actually logically inconsistent or true but extremely uninteresting. Would that still be an interesting exploratory tool? I think so.
> WARNING: Outputs may be unreliable! Language Models are prone to hallucinate text. Trained on data up to July 2022
The model wasn't supposed to make science for you and any scientist who used it like that should probably not be a scientist. It's a language model, i would think people know what that means by now
Yes, it's clearly part of my point that language models are insufficient to produce high quality professional text.
Facebook's intent with the model was quite clear; the abstract of the paper says this:
"In this paper we introduce Galactica:
a large language model that can store, combine and reason about scientific knowledge... these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community."
I didn't expect it to make science for me- that would be quite impressive (it's been the guiding principle for my 30+ years in science)- but I do expect the demo to be less bad.
I honestly don't understand what they were thinking. Transformer-style language models are clearly unsuitable for this task. Starting with a foundation that is known to "hallucinate"[1] in this context strikes me as just crazy. There is no way to start with a hallucinating foundation and then try to purge the hallucinations at some higher level; characterizing the "shape" of "the output of this model - the errors this model makes" is a super complicated N-dimensional space no current AI model has any hope of characterizing. I'm pretty sure that space is straight-up super human.
If such an AI is possible, and I won't promise it isn't, it will certainly not be based on a transformer model. I also won't promise it won't "incorporate" such a thing, but it can not be the base.
[1]: Not sure I love this word, I think I'd prefer "confabulate", but hey, I'll run with it.
I though tthat was quite funny and probably better than what most people would write if they wanted to write a parody about space bears
It's only bad if you expected it to provide scientific answers, which was not the claim. IIRC the page said something about it compiling knowledge. which it does in this article
I think you know that the answer is in the links of that link you posted.
EDIT: I mean that it compiles the story of Laika and Karelian bear dogs
In any case , i understand you may have had a falling out with Lecun before, but that is no reason why this research model should not be online for peopel to test it. Let's try to improve things rather than blocking and banning things
Huh? What answer is in the links of the link I posted? If you mean "the russians sent a bear dog into space", that doesn't explain all the detail the model generated.
The statement about bears isn't just factually wrong, it generated specific details that make it appear right! At first, I was going to say that it wasn't half-wrong because tardigrades (known as water bears) have been sent to space.
This is simply a project that wasn't ready for the real world. A wiser R&D leader would have told the team standards were higher, rather than advising they put a disclaimer on it.
EDIT: since you didn't reply, but just edited (you're probabably at your reply limit depth): laika wasn't a bear dog. She was a mongrel found roaming moscow.
Sticking a bunch of disclaimers onto cyanide doesn't mean that it is useful to describe it as almond milk.
If they had done an adequate and accurate job explaining what, if any, plausible value or potential Galactica had, warnings would not have to have to carry so much weight. Or their PR could have focused on their tricks and optimizations, and not characterised Galactica at all.
Instead they'll be known as the people who described a science flavored word salad generator as if it was a tool useful for "summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more"
If they had, for example, positioned this as a tool for creating unscientific, but academic sounding anti-vax propaganda, people could have questioned their morals, but not the "fitness to purpose" of their tool.
They proved that Galactica would be an excellent front-end for a differentiable information retrieval system. This definitely moved the research forward; work on the back-end (i.e. fact retrieval usable for LLMs) is still needed.
To be honest projects that constantly and consistently generate smart falsehoods (the "bears in the sky" that the author of the article mentions) don't have any scientific need, so to speak. Or, if they do, I fail to see it.
They do. Try asking it about a field that is very novel or very very niche, and it will point out various links. Biology connections (which are often very vague and open) are good for that. I found that it generated some links between mechanisms that i found interesting.
Unfortunately the model is down so i cant review it.
But how can you possibly trust anything the model spits out as accurste and true if it's happy to spit out a whole thing on bears in space, something that plainly never happened.
Sure, it be spitting out these links between mechanisms, but it could have pulled them out of the same place it pulled an article about bears in space: the server's ass. And verification is going to take you far longer on something that actually seems realistic, which is a massive problem.
It's not supposed to spit out truth. Meta may have overhyped it but they were careful not to claim that it makes true claims, but that it compiles knowledge, and they had a disclaimer that it hallucinates fact. Anyone who has seen language models knows that if you ask it about space bears, it won't reply that there re no space bears, instead it will try to create a link between space and bears. It seems to me that people were deliberately using impossible inputs so that they can claim the model is dangerous for doign what a language model does. And their ultimate purpose was to defame the model and take it down. (And BTW we 've heard those "dangerousness" BS claims before)
The usefulness of the model was in under-researched fields and questions, where it can provide actually useful directions.
My whole point is said directions are completely useless if you can't trust the system to be based on factual principles.
It's easy to dismiss "bears in space" as not factual. It's harder to dismiss "the links between two underresearched fields in biology" without putting in an exorbitant amount of work. Work which likely is going to be useless if the model is just as happy to spit out "bears in space".
And that was what I had already said. I asked you how you could possibly trust those links it provided. Because you can't. They may very well be novel links that had never been researched. Or they could have been bears in space.
A scientific model which doesn't guarantee some degree of factuality in its responses is completely useless as a scientific model.
They may have had a disclaimer, but they also presented it as a tool to help one summarize research. It was clearly unreliable there. And who cares if it has some good results? How do you know whether it's spitting nonsense* or accurate information?
Yes, I do think there is value in engaging with the community for models that aren't perfect. But it needs more work before it can be framed as useful tool for scientific research.
* I mean subtle nonsense. Space bears are easy to distinguish..but what if the errors aren't as 'crazy'?
Right. my way of evaluting any system is to start with the easiest tasks. If the system doesn't get the easiest task right, I do not proceed to use its output for complex things.