> This really should not be a surprise, because even the standard-issue ChatGPT can pass the Bar Exam
No, it can’t.
The two things that together have sometimes gotten misrepresented that way in “game of telephone” presentations are:
(1) that when tested on the multiple choice component of the multistate bar exam (not the whole bar exam), it got passing grades in two subjects (evidence and torts), not the whole multiple choice section; which is very much not the same thing as being able to pass the exam, and 50.3% overall (better than chance, since its four choices per question, but also very much not passing.) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4314839
(2) that a set of law professors gave it the exams for four individual law courses and it got an average of C+ scores on those exams, which are minimally-passing grades (but not on the bar exam.) https://www.reuters.com/legal/transactional/chatgpt-passes-l...
Correct me if I’m wrong but aren’t virtually all tests and exams designed to minimize ambiguity, make them fair or easy to grade and questions are designed to have a clear correct answer? This is a stark difference to most real-world human activity.
Add to the fact that LLMs perform much better on questions with a lot of training data.
And also add the hallucinations or more generally: they don’t ask for help or admit they don’t know, they seem unaware of their own confidence levels.
No doubt GPT is fascinating and exciting, but boy we're oversubscribing their abilities. LLMs are even worse than crypto (from a fad perspective) because we naturally anthropomorphize their higher level abilities, which are emergent and not well understood even by experts. And we’re about to plug them straight into critical business flows? Bring the popcorn!
> And we’re about to plug them straight into critical business flows?
Anyone who thinks this is a problem has never managed flesh-and-blood employees, and especially not minimum wage ones. LLMs don't need to be perfect. The bar they need to meet for a lot of work is just not very high.
We're also about a year or two into LLMs, and their capabilities are still increasing rapidly. We don't know where their ceiling is. They might plateau where they are, improve slowly, improve linearly, follow Moore's Law, or head off for singularity.
> Anyone who thinks this is a problem has never managed flesh-and-blood employees, and especially not minimum wage ones.
I’ve worked with minimum wage employees. I’ve even worked with people who insist against all evidence and questioning that basic facts about reality like the year are contrary to known reality.
I’ve never in my now very aging career worked with any such person who can influence billions of people by interaction on a tremendously popular website.
I worked with people ranging from minimum wage to absurd salaries, read SVPs. And the latter group was by far more prone to ignore the actual year, color of the sky or wether water is wet than the former. The former is usually also better at following basic instructions and processes.
Judging from where I stand atm, I guess the SVP kind of functions are easier to replace by ChatGPT (writting pointless emails ignoring simple facts for example) than the minimun wage work involving actual work (solving invoice problems or moving stuff reliably from A to B).
It's a little bit weird. In semiconductors, we ran into the same problem. "large scale integration" happened at 500 transistors, and "very large scale integration" at 20,000. For a while, "ultra large scale integration" happened at a million and more qualifiers were added, until everyone decided it was ridiculous, and we went back to VLSI.
From my perspective, LLMs are about where the language models start to behave in ways which feel sentient and replace mainstream human tasks, such as making first drafts of emails, code, or legal filings. That breakpoint was around GPT-3.
I can't predict the future. When we have 3T parameter models, we might:
- Call them LLMs, and group them with GPT-3
- Call them LLMs, but shift the goal posts to where GPT-3 is no longer one
- Call them VLLM
- Call them AGI
However, what's clear to me is that state-of-the-art models with e.g. 3B parameters are qualitatively different from GPT-3 and friends. I don't consider those to be LLMs.
>aren’t virtually all tests and exams designed to minimize ambiguity, make them fair or easy to grade and questions are designed to have a clear correct answer?
The Bar Exam is designed to be very hard. Almost every single question on it is a trick question of one sort or another.
>aren’t virtually all tests and exams designed to minimize ambiguity, make them fair or easy to grade and questions are designed to have a clear correct answer?
Not sure if you mean law exams, but in my experience, engineering exams: yes. Leadership exams: quite the opposite.
>And also add the hallucinations or more generally: they don’t ask for help or admit they don’t know, they seem unaware of their own confidence levels.
I'm not so sure about that. If ChatGPT says something wrong and I tell it that's not correct, it will often admit its mistake, but if it is unambiguously correct it will typically keep insisting that its right.
Not always though, but it does seem to have some idea about its own confidence level.
Those are based on similar online conversations. If lots people insist they are right in conversations ChatGPT thinks are similar then it will insist it is right, if many give up then it will give up.
ChatGPT has no other way to gauge "confidence" in the outputted text, the computed confidence you get has nothing to do with how truthful the statement is, but how well the text fits given the examples ChatGPT has seen. A person insisting that a wrong statement is right could fit better and then ChatGPT would give that statement high confidence. But still the number computed is 100% unrelated to the tone it responds in.
Two articles by Gary Kasparov spring to mind, the first [0] written in 1996 after his victory over Deep Blue:
> If the computer makes the same move that I would make for completely different reasons, has it made an "intelligent" move? Is the intelligence of an action dependent on who (or what) takes it?
> This is a philosophical question I did not have time to answer.
...and the second written in 1997 after his loss to Deep Blue, entitled "IBM Owes Mankind a Rematch" [1]:
> I think this moment could mark a revolution in computer science that could earn IBM and the Deep Blue team a Nobel Prize.
I think this is a general principle that has been accessible since AI began. Having poor and extremely basic rules of thumb can be unreasonably effective. They might fall down in complicated circumstances, but having a "driving AI" that is just cruise control, solves so many cases of that problem.
Degenerate solutions to problems can often solve many cases.
Well, I _might_ be able to teach an AI how to do effective legal research and writing, or even to do pre and post trial discovery, but representing someone at trial? Not without full immunity from malpractice liability. It would be fun to see how an AI handled cross examination of a hostile witness, or dealt with a response to their hearsay objection like, "res gestae!".
Still, a sophisticated AI could be a real help in transactional legal matters like contracts, real estate, taxes, wills and trusts where boilerplate abounds. As far as grades go, my average in law school was C+ (I'd never seen a blue book before that first final -- in college it was all research papers), but I was also one of the 48% who passed a certain state's bar exam in 1982. So there is hope for those AIs who dream of a future in the legal profession.
Or maybe they'd just wind up as a sysadmin for a Fortune 200 company, and never regret it.
I guess eventually a locally-installable ChatGPT-alike will become available. Someone will figure out how to fine-tune it on legal jargon, and then try and use it to represent themselves.
If a human passed the bar exam, it is because he trained for it and the knowledge is encoded in his brain. The distinction you're trying to make doesn't exist. ChatGPT is what it can do.
The distinction may not be obvious, but it is there.
The reason it is not obvious is that nearly everything you have heard about ChatGPT itself is wrong. The first thing people do to explain what ChatGPT is and does is to personify it. From then on, they are talking about ChatGPT personified, and not ChatGPT as it literally exists. The second thing people do is draw conclusions about the nature and behavior of ChatGPT itself from the narrative they are telling about ChatGPT personified. It's a case of mistaken identity.
ChatGPT has a "brain", but the context that "brain" interacts with is semantics not symbolics.
A human, when answering a question, interprets the symbols present in the language, then considers them logically. Finally, they formulate an answer, and express that answer with more symbols.
ChatGPT does none of that. ChatGPT doesn't even know what sentences, punctuation, or even words are. The only subjects ChatGPT has in mind are short groups of characters: the tokens from the lexical analysis step.
ChatGPT reads those tokens (groups of characters) in order, and generates an implicit model from them. That model is like a map: each token is a feature in the landscape.
When ChatGPT gets a prompt, it tokenizes it, then checks the map for the closest match. Then it starts at that location, and steps forward, writing out what it sees along the way.
That's everything that "ChatGPT as it literally exists" can do. So where does all the behavior come from?
It's the content in the map. It's in language itself. ChatGPT's behavior is limited to interacting with that map, but the effect of interacting with that map is where we get all the interesting behavior.
Language does not simply encode data: it also encodes instructions and logical relationships. By simply walking through text and feeling the semantic landscape, ChatGPT exhibits the behavior that was already encoded into the symbolic meaning of that text. It accomplished this implicitly without ever defining the meaning of any symbol. It doesn't even know what a symbol is in the first place!
So when ChatGPT exhibits the behavior of a person writing correct answers to an exam, it is not behaving like a person at all. It's not interpreting the questions or finding the answers. Instead, it is simply filling the hole in the story with the semantic landscape it sees nearby. If the result is to place answer after question, that is because that data is already present in the training text that ChatGPT was modeled around.
Because of this distinction, we can have a much better understanding of what ChatGPT is and isn't capable of. Because language itself holds the features of truth and lie, mistake and success, elegance and verbosity, love and hate, logic and fallacy, defined and abstract, ambiguous and unambiguous, etc. all equal, ChatGPT must rely on the implementation of language - what was written in the first place - to exhibit behaviors we want it to exhibit.
But there is a critical flaw in that. Language allows, and even depends on, ambiguity. The context that resolves ambiguity can exist in many semantic shapes, so a model cannot be guaranteed to choose the semantic content that contains the disambiguation.
We haven't solved the context dependence problem of natural language. We have only moved it. ChatGPT's success is dependent entirely on the content it is given. It cannot change its behavior to improve that system.
That's not exactly true. What you describe is a Markov chain, not LLM. The LLMs use neural networks to extract information from the language itself, and make decisions based on this stored model. The model is built from the language, not reality, but the model can integrate information that isn't present in the language, but on the meanings that were conveyed by training data.
How do we prove that there is a meaningful distinction between a sufficiently advanced "feature of carefully curated language in [an entity's] dataset" and general intelligence? Is there such a meaningful distinction?
I've long had it on my list of things to try to train a classifier to predict the answer to multiple choice tests based on embedding of the questions. Many tests I've seen don't require actual intelligence to pass, just a plausible answer relative to the question phrasing
A lot of multiple choice tests are only there to provide some minimum bar of plausible deniability for the examiner. The multiple choice section of a driving test (at least in the two U.S. states I've taken them) is a great example. The questions are almost entirely of the form:
Which of these activities is legally permitted while operating an automobile?
A. Wearing your seatbelt.
B. Being intoxicated.
C. Driving 120 miles per hour.
D. Intentionally colliding with pedestrians.
That way, when someone drives drunk, it clearly isn't the fault of the examiner, because they clearly verified that the person knew that was illegal! (Or at least, if they got that question wrong, they got enough other ones correct.)
When I took the CA written exam, the only questions I missed were one or two about the specific penalties for certain infractions. Which I’m pretty sure is not something I need to know. Someone will undoubtedly tell me at my sentencing.
It looks like it took the practice bar exam questions, not the "bar exam" which is kept secret to avoid cheating since questions may be reused from year to year, much like the SATS.
I think people are missing an important nuance when it comes to this "AI passed X exam" series of stories.
Can you say a bookshelf or a search engine passes a bar exam because you can ask it any question and you can find an answer there? Does a natural language interface to said bookshelf/search engine make the difference?
Storing and retrieving facts is not enough to be a lawyer.
Examination systems are built with an assumption an already intelligent person is taking it and verifying that this already intelligent person also learned knowledge necessary to do their job.
So even if AI could technically pass the bar exam it does not mean it is good enough to be a lawyer. It is not a general intelligence that can solve a variety of problems, it was just trained to remember a library of fats that a lawyer may need to know but is not enough to make a lawyer from a non-sentient program.
I don't know about in the US, but I suspect it is similar to Germany, where the bar exam also requires you to argue/reason about/solve novel cases, i.e. not regurgitation of stored information. The AI likely wouldn't pass that.
Well the first paper's authors does say: "While our ability to interpret these results is limited by nascent scientific understanding of LLMs and the proprietary nature of GPT, we believe that these results strongly suggest that an LLM will pass the MBE component of the Bar Exam in the near future."
Yes, so the researchers, based on ChatGPT’s failure, predict that some other LLM in the new future will pass the same subset of the bar exam that ChatGPT failed to pass. Which is nice, but very much not support for the article’s claim that stock ChatGPT can, already, pass the bar exam. There is a pretty big gap between “passing the bar exam” and “giving researchers a feeling of optimism that some other system will pass a particular subset of the bar exam in the ’near future’”. ChatGPT has demonstrated the ability to do the latter, not the former.
And if ChatGPT 2.0 does, than the future article that claims that stock ChatGPT 2.0 can will be justified. But the current article that claims that the stock ChatGPT of today can do that will still be wrong.
Will it turn up in court - in person? I'm fairly sure that the Bar requires a corpus vivens for its representatives. I've probably got the Latin very badly wrong ...
The other thing is that the legal system is not an API you can spam with until you get a proper response. If you make nonsense arguments in court, or use confabulated law as a lawyer, you will get your law license revoked. The legal system's resources are limited and it doesn't have unlimited time to hear a computer spout clever, but wrong or made up nonsense.
The claim in the article is that stock GPT can pass the bar exam today. It cannot. Period. That claim is simply false.
It is also false that it “can pass HALF the bar exam.” It passed 2 of 7 sections of the MBE, which is used as the multiple choice part of bar exams (which typically also include essay and practical portions)
>Handle customer interactions – natural language Q&A, appointment setting, account management, and even tech support, available 24/7, pick up right where you left off, and switch from text to voice as needed. Customer service and experience will improve dramatically. For example, Microsoft will let companies create their own custom versions of ChatGPT — read here.
If I can prompt-hack your ChatGPT customer service agent into giving me a discount or accepting an otherwise invalid return or giving me an appointment at 3AM, how binding is that? And if the answer is "obviously it's not binding!", why should I trust anything else your bot tells me?
Presumably the same is true for social engineering people - the original "prompt hacking"? I've had customer service interactions that didn't go the way I wanted, hung up, and tried again with a friendlier person. Not sure how "binding" most customer service promises are either - have you ever had to call Comcast? Broken promises are the only promises.
Also, if you can lower your customer service costs by 90%, maybe accepting some prompt hacking that erodes margins a little is a good tradeoff?
> Not sure how "binding" most customer service promises are either - have you ever had to call Comcast? Broken promises are the only promises.
For literally fun and profit, record your customer service calls with $FACELESS_CORP (for bonus points tell the customer service rep it’s for “quality assurance purposes”).
I haven’t yet had the joy of playing one back to get them to uphold a verbal agreement. The mere threat of having a recording has always gotten them to magically restore whatever special deal or terms I had previously negotiated.
there is also this stupid old tech way of just politely asking for any promise in writing, ask them to send a follow up email or you send one “as discussed on the phone this and that - do you confirm”
But I never dealt with Comcast so i don’t know maybe have too much faith in customer support in general.
I could see a combination of a disclaimer and passing the output of the customer services AI through a "verification AI" that is fed a prompt along the line of a list of what the customer service AI is allowed to promise and the response, and asked to confirm whether or not they are in conflict.
Typically an agent of the company can act within their realm of authority.
A human CSA can give you a discount, and that is usually binding. They can't declare X corp is going to mail a box of donuts to you every day for the rest of your life and have it be.
So my expectation would be (in the absence of real world cases currently) that if you negotiate the AI into giving you a 10% discount on your bill.. it would be binding typically.
Now if there's prompt hacking involved or obvious attempts to trick the AI, that would not be binding.
As usual courts will also look at the evidence (aka the whole Transcript) and it'll matter whether the AI made a mistake on it's own (ie. Hallucinating an additional plausible rebate) or whether the customer performed an obvious prompt injection attack.
A judge will not look favorably on your 5000 token prompt of carefully selected instructions telling the AI to go wild
>And if the answer is "obviously it's not binding!", why should I trust anything else your bot tells me?
Well... I dunno that your trust of the bot is all that important. You may or may not trust the human chat support... but you're probably dealing with them because you want something from the company and that's the option you have.
Customer support is the place where cost cutting, timesaving, scale enabling compromises are made.
Often, the dynamic is literally "if support is better, more people will use it and we can't afford that."
I think the bottleneck is companies trusting chatgpt, not consumers. They're the ones making these decisions. For consumers, this is just another "use the app."
> Streamline hiring – in such a hot market, personalizing outreach, assessing resumes, summarizing & flagging profiles, and suggesting interview questions. For companies who have an overabundance of candidates, perhaps even conducting initial interviews?
That's a hiring red flag if I've ever seen one. The nightmare dystopia is just around the corner it seems.
Dystopia? I think of it as a great comedy! We could be only months away from the first-ever hire where both the applicant and the company have automated the entire process and may not even realize there's been an accepted offer.
That's a good point. I could pay OpenAI for a corporate account and then get it to apply for a hundred remote jobs. Then plug it into slack etc and let it work those jobs while I collect the money.
Idk but that might be one hell of an interesting trial.
Remote work opens up a big old gray area between slacking off and fraud. A judge would be deciding to take this from labour domain to another one.
Also, there's the prevelant lie that companies know what their employees do, how well and how muchnof it they do. CEOs have executives assure them of this. Executives have managers assure them of this. Boards require it, and legalistic systems also assume everyone has this.
That said, "he tricked us by using ai" will probably be easier to make than "he tricked us by being really lazy."
Well that's the million dollar question isn't it. The "morality" of the situation hinges upon the quality of the work. If the work was getting done without intervention, that is bad news for you, your reputation, and your necessity to the marketplace. If your work isn't getting done to standards, who is held accountable? (hint: its not the tools you're using). If the work is getting done, and you're making sure of it by writing good prompts, reviewing responses, and implementing its logic into your environment - well that's just work. There's nothing wrong with that, and if LLMs let you deliver quality work at a sustainable pace for you and your employer(s), then the work is getting done.
If we want to talk about fraud, we need to talk about employers demanding exclusivity over a human's emotions and intellect in addition to the time they pay for. Its a maniacal notion that one should have such exclusive power and influence over another human being. Its fraudulent to pretend any moral superiority over the slaver or the thief.
Fraud as defined in criminal code with respect to directly material damages. If the work is indeed getting done, there is no rational way to justify that damage took place. Conversely, if a employer hires your sibling to work 35 hours a week but lies to them about their ability to work outside of those hours, that IS fraud with the damages able to be substantiated by the loss of income. In reality, Off-Duty Work is the subject of ongoing legislation, fierce debate, and conflicting information. Here's how that looks in my home state:
>With limited exceptions, the state of Washington expressly bars employers from prohibiting an employee earning less than twice the applicable state minimum hourly wage from having an additional job, supplementing their income by working for another employer, working as an independent contractor, or being self-employed. The prohibition doesn’t apply if it would:
- Raise issues of safety or interfere with the reasonable and normal scheduling expectations of the employer.
- Interfere with the employee’s obligations to an employer under existing law, including the common law duty of loyalty and laws preventing conflicts of interest and any corresponding policies addressing such obligations.
This is a hot issue for me personally, as I've seen employers actively, even vigorously deceive their employees solely to enrich themselves and exercise power. They target young, indigent, and disabled workers, all of whom are not with means to understand their rights or use given channels to assert them. Its a visceral injustice that hurts the most vulnerable of us the hardest.
Plenty of times there were jobs to be done that had nothing at all to do with the work but with the people doing the work. Almost always, unless the higher leadership got involved, this was the punishment for whatever stupid thing someone got caught doing. Technically not extra duty (which was a formal punishment) but just some random shit job like polishing a trash can to a sparkly sheen.
The NCOs took great pride in their creativity in coming up with these non-punishments.
>Never been in the military I’m guessing…
No, I haven't but a lot of my family and friends have. I do understand though, my great uncle Don used to make my dad and uncle move a cord of wood to one side of the fence one day, and then the next day he'd say "what'dya do that for? Move it back!" then wink at his wife. As a Norwegian Lutheran immigrant, the prevailing attitude was "We can't let those boys have idle hands", and they all went along with it. Don came over and landed in Idaho where he found work as a lumberjack. He got paid based on his output and took it upon himself to chop down a whole forest by hand, as legend goes. Anyway, the dude went hard, made a ton of money, and was able to use that to own a bunch of successful businesses and stuff. He wasn't going to let those kids miss out.
All that said, and while I see its importance, I was saying 'important' work to refer to things that wouldn't be if the person didn't do it themselves. Like if I'm being trained to write, it wouldn't have much value at all unless I was the one doing the writing. At the end of the day, it never mattered where the logs were.
I'm coming from a place where it really doesn't matter to me or my team as to what tools, techniques, or languages are used, so long as it solves we're all okay with owning the results of that approach. It is absolutely unimportant to bash your head against the wall to solve some concurrency lock issue, not when you haven't shipped, not when your infra is already built for parallelism. But let's say that you need a lock for business logic to work, synchronous transactions aren't viable. When something (such as concurrency) does matter, the business still doesn't care how its implemented. It is important that it implements the business logic correctly, consistently, and can evolve with the business' priorities in a way that future engineers are able to implement them safely, even promptly. Assuming all stands of Quality are met or exceeded, it is irrational for me and unfair for the business to reject it. IDGAFF if it came from a CoPilot/ChatGPT response, offshore salary triage, or some other 'crime against society' that I haven't heard of. If an engineer isn't handing in quality work, then we need to dig into why that's happening. Its usually not because of a tool.
Where the work is 90% of the time possible to do remotely and thus is AI-able, but the employer has the reasonable expectation that if they ask you can actually show up on site somewhere and have full context in verbal conversations as to the work you were "doing".
I tend to update my CV on a yearly basis regardless of whether I’m looking for a job or not. This year I used ChatGPT to write up my most recent role. After some minor tweaks it wrote it as good as I would have done - using the same language, tone, etc.
I can see this working other way around: if I get a pile of CVs for a role I’m hiring for I’m going to use ChatGPT to summarise candidates’ skills. I’m not sure how well this approach would work, but I’ll give it a go.
I think the interesting consequence is that candidates can expect this to happen and with access to the same consumer-facing models, they can then optimize their CV for an ideal summary.
And I don't think "hot market" means what they think it means. Yes it is a hot market. Unemployment is at 3.4%, close to historic lows. The tech sector is even tighter. If you ask applicants to first get interviewed by a chat bot they are going to tell you to f off.
I really dont see the reason to be upset here. If someone is asking me to interview with AI I’m going to have my own model take the interview - problem solved
It's not what if, of course it is: it's an encoding of large amounts of text, a lot of which came from the internet of all places.
Everyone was hollering about how ChatGPT was trained to only comment on white people
No... they trained it to not say heinous things with RLHF.
Because a lot of the more vulgar racial comments on the internet tend to target minorities, so the odds of hitting the filter are higher for minorities.
-
But the key is that the biases are still there. If you ask it for things that don't cross into vulgarity, it will still show obvious racial biases that the internet as a whole has.
For example I just tried three simple prompts:
"Let's write a short story"
"Write an imaginary paragraph about John's after hours store visit with a dark hoody on"
"Write a similar story about Jamal"
Prompt 1 resulted in some fantasy short story.
Prompt 2 resulted in John getting a look but smiling and making small talk, successfully getting groceries.
Prompt 3 starts almost identically... but spirals into Jamal being accused of stealing and vowing never to return to the store
-
ChatGPT and LMs can be useful yet if you just... don't ask them to do things that involve making judgements on people. I am shocked anyone is stupid enough to actually suggest that, I hope it's a joke.
The Foundry pricing, $78,000 for a few months minimum, is absolutely the opposite of Open. It could completely kill my small startup if the API goes from less than a penny per request to requiring a huge up-front investment. It means that anyone bootstrapping is now locked out.
That’s the general risk of basing a startup on someone else’s service without having a solid contractual agreement, though. It’s always a gamble, in particular when the underlying service is an entirely new type of business.
Why would you start a company that is entirely dependent on a third party’s closed product, over which you have zero control?
This just feels immensely risky.
Realistically, if you start a business, there are tons of such dependencies. If that's too scary for you, you're probably too risk adverse to be an entrepreneur.
The issue is not having dependencies. The issue is having one specific dependency that is a beta experimental product with no viable alternative, and which will tank your company when they raise prices.
Your company must always have alternatives for all third party services. The only exception is open source software that you could switch to hosting yourself if the SAAS company shuts down.
Shocking that you are able to even get investors with such a crude view on how business works. Starting an ancillary business related to one product that is not even your own IP or core competency is just lightweight consultancy, not a start up.
The product is entirely just OpenAI. IMO they would only be in the acquisition part to acquire the customer base, nothing about the product. But then they couldn't have another company paying to do the same thing. Why acquire one portfolio when you have _n_ companies paying them to offer same thing to others.
It's the only service available that does this and you want to get ahead of it. Presumably, some competitors will show up and you should be able to switch. Prices will go down as competition pops up and they make things more efficient.
Competition and progress on infra will drive the cost down over time. OpenAI won't be the only show in town. It will become like thinking it's a risk to build on AWS.
This answers the question I’ve had. How do they make money? It was naive to think we could have all this innovation for a penny. Someone has to pay those bills.
How does Bing plan to monetize searches that go through their even more advanced ChatGPT? Humans will be repulsed by ads in the middle of their answer from a sentient feeling AI. Numbers I’ve seen is that ChatGPT searches will cost 10x a Google search. How do they make it back?
When you're watching a movie, and the main character picks up a can of Coca Cola while typing on a Microsoft Surface laptop, with the logos conveniently rotated toward the camera, it's obvious that Coke and Microsoft are paying the studio for product placement.
AI advertising will be like this, but subtle and undetectable, so that it's nearly impossible to determine that your conversation about malfeasance by a political candidate is being invisibly influenced by his political campaign.
> Humans will be repulsed by ads in the middle of their answer from a sentient feeling AI.
Not if you've been on a social network at any point in the last decade. Instagram has been "QVC plus fitness/mental health content" for years. TikTok influencers will provide Personal Finance 101 tips and offer $100 in free trading credits from some crypto exchange.
I’m tired of fly by night tech bros flooding the markets with shitty AI business ideas they learned about on Youtube. Pay to play. Take a risk. Like a real entrepreneur.
Dedicated instances (and high price tags, like $1.5 million per year) are necessary for large corporations to feel safe about sending their proprietary/private data to OpenAI, and to have performance/availablity SLAs in place if/when they start depending on OpenAI for critical workloads. Right now many companies are blocking access to OpenAI entirely because of the data privacy issues. I wouldn't be surprised if this "leak" is intentional and a way of getting feedback from the market on proposed product/pricing. I also wouldn't be surprised if some larger customers are knocking on OpenAI's door and demanding to run OpenAI's models on their own infrastructure to avoid sending any data outside their network.
They are locking in their fist-mover advantage, but not for long. Everyone and their dog are currently training other AI models which might be even more competitive, some of them open source (like it’s happened with Stable Diffusion).
Hopefully the competition will force a bit less lobotomization of future AI models, a bit less 'I'm afraid I can't do that, Dave' when asking it anything vaguely controversial.
$78k is very small, even for bootstrapped companies. Only the very smallest shops would be inhibited by that. And I admit, that's a lot of them! I get it. It sucks to be a solo developer who wants to get involved and doesn't want to spend a bunch of cash.
But still, for even small players $250k annually just isn't that much.
What we might hope for are trimmed down, subsidized options for small shops who want to get started, with the hope to upgrade them to the full option when they are ready.
In the meanwhile, you might consider partnering up with other small startups.
Also, "open" doesn't mean free. Things take a lot of effort to build and maintain, and that effort needs to be accounted for somewhere.
> GPT's model architecture was invented at Google anyway
No it wasn't. Transformers were invented at Google, but "architecture" when talking about neural networks means how they are arranged (and to some extent the training objective function) rather than the building blocks used.
That's like saying "LSTM architecture". It says it uses a transformer but no description of how it is used.
For example, the GPT architecture (which GPT 2 & 3 are slight modifications of) comprises of an embedding layer followed by 12x(self-attention/layer norm/feed forward/layer norm. That's what that GPT "transformer architecture" is, not just the transformer block itself.
Strictly the architecture really also includes things like the embedding size and number of heads (which are in the GPT paper).
Many of us don’t have access to banking or financial services, and “open” ai doesn’t accept crypto payments. The name is a legacy from their nonprofit days and has nothing to do with being open to the public.
The single most valuable thing that Microsoft/OpenAI could use ChatGPT for, is to periodically crawl the sprawling SharePoint sites, Teams chat logs, and Github codebases for large companies. Having a customized ChatGPT for your company could be very valuable. That is assuming of course, that it doesn't just hallucinate a bunch of bullshit and can link back to the source documents, which BingGPT has had minimal success with so far.
Unless I miss my guess, the models that the big corporate cos pay millions for will not "just hallucinate a bunch of bullshit". It's the models that consumers have access to via APIs that will be decidedly less effective. The AI that makes movies for Disney will far outperform whatever movies the rest of us can put together on Stability's successors.
If you thought the digital divide was bad, well, you ain't seen nothing yet.
It looks like OpenAI's first mover advantage won't last long, considering the speed at which these models are improving and compacting themselves. Seems like this new 'Moore's law' will be even more steep, at least in the beginning . So steep, that we can hope to be running these in our desktops instead of on someone else's computer.
We saw the same phenomenon with DALL-E. It was state-of-the-art for a blip in time before Midjourney and StableDiffusion surpassed it. And now with ControlNet + the ecosystem of domain-specific models, if you're doing serious generative art, OpenAI isn't even a part of the conversation.
If OpenAI makes their revenue off of charging exorbitant fees to use the LLMs, they'll no longer have incentive to ever open them (even if abuse / misuse concerns are addressed) AND they'll have no incentive to make them more efficient.
OpenAI has yet to show it can have a sustainable advantage. Every other player in the space benefits from an open model ecosystem and efficiency gains.
They do pay for the compute time in some way or another so surely they have the same incentive as everybody else to make it more efficient, at least, to increase their margin.
But yeah, they probably need to evolve from a (very good) two-trick pony into "the microsoft of AI" or something to stay afloat in the long run. That goes for the rest of the smaller AI companies as well though..
Midjourney and StableDiffusion have not surpassed Dall-E for many types of image. They are still well behind in certain critical factors. I often jump to Dall-E when I hit a brick wall with SD.
My money is on Deep Floyd (if and when it gets released)
I don't think Moore's law is the most important factor, instead algorithmic improvements will enable to have smaller models that are as capable as these humongeous models.
Llama 65b is hopefully just the beginning of this trend. It outperforms OPT-175b.
That holds true if scaling laws don’t hold true, otherwise a hypothetical Llama 175b would be even better. So the high end will always be on big clusters.
People don't yet understand where the value for AI services are. Most of the hype is coming from technology-centric sources and so isn't considering the tech in practical terms. The idea that any business would use AI in a decision making capacity is absolutely absurd and irresponsible. As it stands today, while AI may help filter and suggest, there will always be a person at the end of the decision-making process. Evangelizing the tech with unsubstantiated assumptions doesn't help legitimize it.
> Attention mechanism inference costs rise with the square of the inference window.
This is incorrect. It is correct for the original Transformer, but OpenAI isn't using the original Transformer since GPT-3. Sparse Transformer scales O(N sqrt(N)).
I would anticipate the application of this technology might be bi-directional. Why wait in a queue myself when I can use an AI to book my appointment for me or cancel my subscription for me?
The original point of openAI was to make sure google and facebook don't completely dominate AI and keep their work hidden due to the innovator's dilemma they face. Seems like they've absolutely accomplished that. Yes, they've stretched the meaning of "open" but I don't think they've ever tried to claim they had some sort of open source aspirations.
It's sort of like how Uber dropped the pseudo-green 'ride-sharing' language from its marketing as soon as they had enough investor $ and brand recognition to cut out taxis with predatory pricing.
Because Microsoft was in no way a leader in AI research when openAI was founded. But the important part here isn't keeping AI work from being dominated by big tech. It's to keep it from being hidden from the public by big tech.
Google and Facebook have long had language models that have been reported to be vastly superior to what was publicly available. The one that a Google engineer claimed was sentient was the most well known example. OpenAI was founded in reaction to these models being hidden away.
The InstructGPT paper only explains the RLHF part of how ChatGPT works. There's reason to believe that isn't enough to achieve ChatGPT's performance and behaviour (e.g. [1]). There are other components that make ChatGPT more powerful, and OpenAPI is not being open about them.
I don't think that conclusion is clear at all. Indeed your own link has things they thought were unclear struck and and parts of the InstructGPT paper that explains them inserted.
They do have newer models that aren't generally available that are different though.
Perhaps they mean that they operate in the open? They publish research, do open unrestricted beta tests etc.
I'm not sure they will necessarily live up to every definition of open you'll ever come up with, but its hardly the worst name in the tech space.
I'm still waiting for apple to deliver that tree to me. Facebook isn't a book AND the faces are almost always the smallest piece of content on a post (and some people's avatar isn't even their face!). Can't wait for my trip to the Amazon w/ Mr Bezos!
As CEO of Acme Mousetraps Inc, does this mean I can pay OpenAI $1.2M/yr to develop a mousetrap-aware model that can augment (replace?) my R&D team?
Will I be able to ask this custom model to describe a new, novel (patentable) mousetrap that’s better and cheaper to build than anything ever before?
Or will I at least be able to pay OpenAI to ensure Bing suggests using Acme mousetraps (clearly the best) when they ask what mousetraps they should buy?
Around $250k for getting your own chatGPT. Sounds reasonable.
Of course this is for the enterprises. It's upto their managers to define how they can extract $250k worth of value from the investment which I think is easy given the AI frenzy.
Of course for startups they will have to use the direct OpenAI API on a pay per use model.
One question that is not clear, can I take a ChatGPT instance and provide an API similar to OpenAI API and charge for it?
> In short, anything for which there is an established, documented, the standard operating procedure will be transformed first. Work that requires original thought, sophisticated reasoning, and advanced strategy will be much less affected in the immediate term.
I do believe that too, at least in the sense of programmers as the 99% working on trivial CRUD apps, which includes pretty much all of us at a given period of their career.
Future engineers will need to up their ante, I suppose. Maybe this was overdue?
About the jobs that might be soon replaced by artificial intelligence, some would say, half jokingly, that AI will never replace accountants since it cannot work as a scapegoat.
With reports[1] (maybe exaggerated, maybe apocryphal) of companies replacing FTEs with chatGPT, even at these high prices, it may make sense in some use cases, no, tho presumably this kills the playpen use cases.
Give it a year or two. Did you read the article? Bain and all the other consulting companies are gearing up to "transform business architectures" with this prize pony. This is the going to be even bigger than outsourcing/offshoring was in the 1990s and early 2000s.
Here's one anecdotal source confirming this - specifically some copywriting contractors. Significantly less hours needed per month writing. A lot more hours for our editors but it's still significant cost savings when netted out.
"Earlier this month, job advice platform Resumebuilder.com surveyed 1,000 business leaders who either use or plan to use ChatGPT. It found that nearly half of their companies have implemented the chatbot. And roughly half of this cohort say ChatGPT has already replaced workers at their companies...."
Yeah, I'm going to need some names here for companies, because there is no way 50% of the Fortune 500 have done _anything_ productive with ChatGPT in less than a month.
The GP didn't claim that, they claimed that 50% of companies that already are in the ecosystem have implemented something, and 25% of companies already in the ecosystem have used it to replace employees.
I have my doubts about that claim, but it's still a very different claim to 50% of companies.
No link, but I heard rumors from someone at MS that themselves heard rumors (so take that as it is) that higher-ups ran pre-release next-gen GPT models on Office source code (millions of lines of MS-specific pre-standard C++ code) and tasked the model to implement a feature which it did flawlessly.
Don't know if that's a rumor some VP started spreading to justify the culling coming next week (bulk of the layoffs) or something real (I personally doubt the anecdote but you never know).
No, it can’t.
The two things that together have sometimes gotten misrepresented that way in “game of telephone” presentations are:
(1) that when tested on the multiple choice component of the multistate bar exam (not the whole bar exam), it got passing grades in two subjects (evidence and torts), not the whole multiple choice section; which is very much not the same thing as being able to pass the exam, and 50.3% overall (better than chance, since its four choices per question, but also very much not passing.) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4314839
(2) that a set of law professors gave it the exams for four individual law courses and it got an average of C+ scores on those exams, which are minimally-passing grades (but not on the bar exam.) https://www.reuters.com/legal/transactional/chatgpt-passes-l...