The report cites both GPT-3.5 and GPT-4 scores on page 7 [1]. I've checked the numbers and they compare FreeWilly2 to GPT-3.5. For example, HellaSwag score of 85.5% corresponds to GPT-3.5.
For example, you can look up semantically relevant ( to a user query ) paragraphs from some internal document. Then, include them in the LLM context so it would know how to answer the query. Basically, that's the idea behind many ChatGPT plugins.
I'm going through the dataset with your datasette tool and it looks like it might be a good idea to clean things up a bit. There are many duplicates[1], creepypastas[2] and other strange things in there.
EDIT: Maybe I'm passing link wrong, the query I'm using is
select count(instruction), instruction, group_concat(context, '
=============
') as c, group_concat(response, '
=============
') as r, group_concat(category, '
=============
') as cat from [databricks-dolly-15k] group by instruction having count(instruction)>1 order by count(instruction)desc limit 100
[databricks-dolly-15k] should be the name of dataset, first column is the number of instruction duplicates
Creepypastas are responses to instruction:
Imagine you are the last person on Earth. Write a diary entry describing your thoughts and feelings.
They are leveraging Apple’s Metal Performance Shaders[1] not the neural engine. From the chart, it looks like you might get ~20x max boost on inference over plain CPU. Obviously, it's not like having RTX 4090 but better than nothing.
This is incredible to me (not your comment per se, but what you're referencing). I really don't understand how brittle and fragile Python is with all its dependencies. It's crazy to me that a simple bump from 3.10 to 3.11 can break Pytorch. This is like bumping your Ruby version up one level and suddenly Rails doesn't work.
Why on earth is Python like this? It's so frustrating coming from other languages where the dependency management plan isn't just so YOLO and free-for-all.
I have despised Python ever since the 2=>3 transition for the reasons you say. Tools like pyenv help, but it's still a mess. It makes me sad that all the popular ML tooling ends up built in Python.
I wonder how much the space has been encumbered by Python’s relative weaknesses. As a bit of an outsider, I kind of assume there’s some hidden advantage of Python for AI/ML that I just don’t “get.”
A question to the author. Can you perform an ablation study with respect to the chunks? In other words, if you put in the context irrelevant/random chunks from the document would the quality of answers decrease/stay similar?
Potential issue might be that chunks just serve to activate massive knowledge of GPT4 and not actually used as a basis for an answer. For example, GPT4 has surely seen Dune in its training corpus and could be answering from memory.
This is an interesting idea. I'll have a think about a way to start measuring it. In Unriddle, any responses given that aren't drawn from the document are prefaced with a message to that effect. The bot usually says something like "I appreciate your curiosity about [query], but as an AI assistant, my primary focus is to provide advice on [document description]."
1) I've looked at both codebases and this one is definitely a derivative of the nanoGPT. You can compare all three implementations yourself as they are actually surprisingly compact and readable.
2) The issue whether weights are copyrightable at all has not been settled yet. If they are, there is a fair use doctrine that allows transformative works of a copyrighted work. The line is a bit blurry but consider Cariou v. Prince case[1] where addition of colour to some black and white photos was considered enough to be transformative. Similarly, full fine tuning on current news or adding visual modality could potentially create a brand new model in the eyes of a law.
I might be missing something but it looks to me that actually running this "open" model requires special hardware only accessible with a cloud subscription with 60 000 USD / week minimum spend[1]. Can anyone confirm if you can run it on your own hardware? If software is open but hardware is locked I don't see the point.
The PyTorch model files are already available to download from Hugging Face - the largest one looks to be 52GB. They should run on any hardware that can run regular PyTorch models.
This has nothing to do with facebook. The foundational model here is GPT-J which is opensource and safe to use. Sadly, it is inferior to state-of-the-art models such as LLaMA.
But they're "using data from Alpaca". I don't know what that means, isn't Alpaca using data generated by ChatGPT, which isn't "clean" to use? Or data from Facebook, which isn't "clean" to use? I'm drowning.
They are instruction tuning it using the dataset released by stanford-alpaca team. The dataset itself is synthetic (created using GPT-3) and somewhat noisy and in my view can be easily recreated if OpenAI ever tries to go after it (which is very unlikely). Anyway, facebook has nothing to do with anything used by this project.
So, this is a "dirty" model, in that is was created by data which violated OpenAI ToS. Obviously, this kind of violation is basically fine if you're a massive corporation who the rules don't apply to, but it's a huge risk if you're a small fish.
"basically fine if you're a massive corporation who the rules don't apply to, but it's a huge risk if you're a small fish"
With these things, it is usually the other way around.
If you are a small fish, no one will care. But if you are big enough, that money could be extracted from you, then they will come. A big org just has better lawers and negotiating power, but they really cannot ignore the law. Especially not, if there is a competitor with money to sue.
So if you are small and want to become big, better be cautious on the legal ground you are walking.
ToS are not the law. It would be similar to your power company claiming copyright over the code written using "their" electricity. Not going to happen. I wouldn't be too concerned.
That would be anticompetitive practice that is actually against the law in many countries[1]. In the unlikely event of OpenAI ever engaging in such things they will be sued into oblivion.
No it wouldn't. Wikipedia has a crap definition that inexplicably focuses on cartels where multiple companies coordinate the refusal, which this definitely isn't. The FTC has a better definition for US law [1].
Companies routinely ban users for ToS violations. Just look at any thread about Google on here to see people complaining about it.
The FTC link has an example of the only newspaper in town refusing to deal with customers who are also running ads on a radio station. Do you think if the newspaper dressed such refusal as a ToS violation it would fly with FTC?
Google might be banning people for enforceable violations of their ToS but imagine the uproar if they banned a Bing engineer for using Google search to find solutions for some Bing problem (which is similar to the problem here). The upside for Google or OpenAI would be somewhat limited but the downside is almost boundless.
If you use output, from a non-profit who open sourced the output gained by following the TOS, as in they aren't using it 'for profit', it's not illegal, because:
A. it's an output gained via following the letter of the law (TOS).
B. TOS only applies directly to people who've accepted the TOS, unless alpaca's license/TOS ALSO forwards the same criterion as it's source at openai, then derivatives wouldn't apply.
It's like if an app developer on IOS violated a TOS, and apple tried to go after everybody who ever used the app, they didn't agree directly to the TOS, only the developer did.
[1] https://arxiv.org/pdf/2303.08774v3.pdf