Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
OpenAI's Foundry leaked pricing says a lot (cognitiverevolution.substack.com)
307 points by Michelangelo11 on Feb 28, 2023 | hide | past | favorite | 206 comments


> This really should not be a surprise, because even the standard-issue ChatGPT can pass the Bar Exam

No, it can’t.

The two things that together have sometimes gotten misrepresented that way in “game of telephone” presentations are:

(1) that when tested on the multiple choice component of the multistate bar exam (not the whole bar exam), it got passing grades in two subjects (evidence and torts), not the whole multiple choice section; which is very much not the same thing as being able to pass the exam, and 50.3% overall (better than chance, since its four choices per question, but also very much not passing.) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4314839

(2) that a set of law professors gave it the exams for four individual law courses and it got an average of C+ scores on those exams, which are minimally-passing grades (but not on the bar exam.) https://www.reuters.com/legal/transactional/chatgpt-passes-l...


Correct me if I’m wrong but aren’t virtually all tests and exams designed to minimize ambiguity, make them fair or easy to grade and questions are designed to have a clear correct answer? This is a stark difference to most real-world human activity.

Add to the fact that LLMs perform much better on questions with a lot of training data.

And also add the hallucinations or more generally: they don’t ask for help or admit they don’t know, they seem unaware of their own confidence levels.

No doubt GPT is fascinating and exciting, but boy we're oversubscribing their abilities. LLMs are even worse than crypto (from a fad perspective) because we naturally anthropomorphize their higher level abilities, which are emergent and not well understood even by experts. And we’re about to plug them straight into critical business flows? Bring the popcorn!


> And we’re about to plug them straight into critical business flows?

Anyone who thinks this is a problem has never managed flesh-and-blood employees, and especially not minimum wage ones. LLMs don't need to be perfect. The bar they need to meet for a lot of work is just not very high.

We're also about a year or two into LLMs, and their capabilities are still increasing rapidly. We don't know where their ceiling is. They might plateau where they are, improve slowly, improve linearly, follow Moore's Law, or head off for singularity.


> Anyone who thinks this is a problem has never managed flesh-and-blood employees, and especially not minimum wage ones.

I’ve worked with minimum wage employees. I’ve even worked with people who insist against all evidence and questioning that basic facts about reality like the year are contrary to known reality.

I’ve never in my now very aging career worked with any such person who can influence billions of people by interaction on a tremendously popular website.


I worked with people ranging from minimum wage to absurd salaries, read SVPs. And the latter group was by far more prone to ignore the actual year, color of the sky or wether water is wet than the former. The former is usually also better at following basic instructions and processes.

Judging from where I stand atm, I guess the SVP kind of functions are easier to replace by ChatGPT (writting pointless emails ignoring simple facts for example) than the minimun wage work involving actual work (solving invoice problems or moving stuff reliably from A to B).


Actual value has always been created by the workers not the capital holding class, so that makes sense


Perhaps not minimum wage employees, but there have certainly been plenty of politicians who have spouted far worse.


They literally didn't know what the current year was? Were they high and/or mentally disabled?


We are not “a year or two into LLMs”. To name one example, BERT is more than 5 years old.


Is 110 million parameters really a "large" language model though? Especially since the models gain novel skills as they scale up.


So if a new model is trained with 10x parameters of gpt3, is gpt now no longer an llm?


It's a little bit weird. In semiconductors, we ran into the same problem. "large scale integration" happened at 500 transistors, and "very large scale integration" at 20,000. For a while, "ultra large scale integration" happened at a million and more qualifiers were added, until everyone decided it was ridiculous, and we went back to VLSI.

From my perspective, LLMs are about where the language models start to behave in ways which feel sentient and replace mainstream human tasks, such as making first drafts of emails, code, or legal filings. That breakpoint was around GPT-3.

I can't predict the future. When we have 3T parameter models, we might:

- Call them LLMs, and group them with GPT-3

- Call them LLMs, but shift the goal posts to where GPT-3 is no longer one

- Call them VLLM

- Call them AGI

However, what's clear to me is that state-of-the-art models with e.g. 3B parameters are qualitatively different from GPT-3 and friends. I don't consider those to be LLMs.


Nah, the new ones can just be called Xllms.

Then we can just keep adding Xs every generation.


>Add to the fact that LLMs perform much better on questions with a lot of training data.

The answer key can get a 100% on the exam.


Would a human with a photographic memory count as possessing an answer key?


Yes, also a human with an answer key.


>aren’t virtually all tests and exams designed to minimize ambiguity, make them fair or easy to grade and questions are designed to have a clear correct answer?

The Bar Exam is designed to be very hard. Almost every single question on it is a trick question of one sort or another.


Hard is different from ambiguous.


"Trick question" may include or be adjacent to "ambiguous" depending on methods.


>aren’t virtually all tests and exams designed to minimize ambiguity, make them fair or easy to grade and questions are designed to have a clear correct answer?

Not sure if you mean law exams, but in my experience, engineering exams: yes. Leadership exams: quite the opposite.


>And also add the hallucinations or more generally: they don’t ask for help or admit they don’t know, they seem unaware of their own confidence levels.

I'm not so sure about that. If ChatGPT says something wrong and I tell it that's not correct, it will often admit its mistake, but if it is unambiguously correct it will typically keep insisting that its right.

Not always though, but it does seem to have some idea about its own confidence level.


Those are based on similar online conversations. If lots people insist they are right in conversations ChatGPT thinks are similar then it will insist it is right, if many give up then it will give up.

ChatGPT has no other way to gauge "confidence" in the outputted text, the computed confidence you get has nothing to do with how truthful the statement is, but how well the text fits given the examples ChatGPT has seen. A person insisting that a wrong statement is right could fit better and then ChatGPT would give that statement high confidence. But still the number computed is 100% unrelated to the tone it responds in.


LLMs will soon replace lawyers and law enforcement. OpenAI DV could debate a supreme court level lawyer.


Two articles by Gary Kasparov spring to mind, the first [0] written in 1996 after his victory over Deep Blue:

> If the computer makes the same move that I would make for completely different reasons, has it made an "intelligent" move? Is the intelligence of an action dependent on who (or what) takes it?

> This is a philosophical question I did not have time to answer.

...and the second written in 1997 after his loss to Deep Blue, entitled "IBM Owes Mankind a Rematch" [1]:

> I think this moment could mark a revolution in computer science that could earn IBM and the Deep Blue team a Nobel Prize.

[0] https://content.time.com/time/subscriber/article/0,33009,984...

[1] https://content.time.com/time/subscriber/article/0,33009,986...


I think this is a general principle that has been accessible since AI began. Having poor and extremely basic rules of thumb can be unreasonably effective. They might fall down in complicated circumstances, but having a "driving AI" that is just cruise control, solves so many cases of that problem.

Degenerate solutions to problems can often solve many cases.


Well, I _might_ be able to teach an AI how to do effective legal research and writing, or even to do pre and post trial discovery, but representing someone at trial? Not without full immunity from malpractice liability. It would be fun to see how an AI handled cross examination of a hostile witness, or dealt with a response to their hearsay objection like, "res gestae!".

Still, a sophisticated AI could be a real help in transactional legal matters like contracts, real estate, taxes, wills and trusts where boilerplate abounds. As far as grades go, my average in law school was C+ (I'd never seen a blue book before that first final -- in college it was all research papers), but I was also one of the 48% who passed a certain state's bar exam in 1982. So there is hope for those AIs who dream of a future in the legal profession.

Or maybe they'd just wind up as a sysadmin for a Fortune 200 company, and never regret it.


I have read enough bad lawsuits to believe an AI could be better than most pro se litigants.


Unfortunately, the AI read the pro se litigants too. ;)


I guess eventually a locally-installable ChatGPT-alike will become available. Someone will figure out how to fine-tune it on legal jargon, and then try and use it to represent themselves.


That's not even the important point.

ChatGPT can't do anything. It exhibits behavior that was already encapsulated into the semantics of language itself.

The only behavior ChatGPT has is to generate semantic continuations from its implicit language model.

Every other behavior is a feature of language, not of ChatGPT.

Even if ChatGPT could exhibit a passing exam, that would be the feature of carefully curated language in its dataset; not a feature of ChatGPT itself.


If a human passed the bar exam, it is because he trained for it and the knowledge is encoded in his brain. The distinction you're trying to make doesn't exist. ChatGPT is what it can do.


The distinction may not be obvious, but it is there.

The reason it is not obvious is that nearly everything you have heard about ChatGPT itself is wrong. The first thing people do to explain what ChatGPT is and does is to personify it. From then on, they are talking about ChatGPT personified, and not ChatGPT as it literally exists. The second thing people do is draw conclusions about the nature and behavior of ChatGPT itself from the narrative they are telling about ChatGPT personified. It's a case of mistaken identity.

ChatGPT has a "brain", but the context that "brain" interacts with is semantics not symbolics.

A human, when answering a question, interprets the symbols present in the language, then considers them logically. Finally, they formulate an answer, and express that answer with more symbols.

ChatGPT does none of that. ChatGPT doesn't even know what sentences, punctuation, or even words are. The only subjects ChatGPT has in mind are short groups of characters: the tokens from the lexical analysis step.

ChatGPT reads those tokens (groups of characters) in order, and generates an implicit model from them. That model is like a map: each token is a feature in the landscape.

When ChatGPT gets a prompt, it tokenizes it, then checks the map for the closest match. Then it starts at that location, and steps forward, writing out what it sees along the way.

That's everything that "ChatGPT as it literally exists" can do. So where does all the behavior come from?

It's the content in the map. It's in language itself. ChatGPT's behavior is limited to interacting with that map, but the effect of interacting with that map is where we get all the interesting behavior.

Language does not simply encode data: it also encodes instructions and logical relationships. By simply walking through text and feeling the semantic landscape, ChatGPT exhibits the behavior that was already encoded into the symbolic meaning of that text. It accomplished this implicitly without ever defining the meaning of any symbol. It doesn't even know what a symbol is in the first place!

So when ChatGPT exhibits the behavior of a person writing correct answers to an exam, it is not behaving like a person at all. It's not interpreting the questions or finding the answers. Instead, it is simply filling the hole in the story with the semantic landscape it sees nearby. If the result is to place answer after question, that is because that data is already present in the training text that ChatGPT was modeled around.

Because of this distinction, we can have a much better understanding of what ChatGPT is and isn't capable of. Because language itself holds the features of truth and lie, mistake and success, elegance and verbosity, love and hate, logic and fallacy, defined and abstract, ambiguous and unambiguous, etc. all equal, ChatGPT must rely on the implementation of language - what was written in the first place - to exhibit behaviors we want it to exhibit.

But there is a critical flaw in that. Language allows, and even depends on, ambiguity. The context that resolves ambiguity can exist in many semantic shapes, so a model cannot be guaranteed to choose the semantic content that contains the disambiguation.

We haven't solved the context dependence problem of natural language. We have only moved it. ChatGPT's success is dependent entirely on the content it is given. It cannot change its behavior to improve that system.


I really don’t understand the distinction you’re getting at between what ChatGPT is and what ChatGPT does.


That's not exactly true. What you describe is a Markov chain, not LLM. The LLMs use neural networks to extract information from the language itself, and make decisions based on this stored model. The model is built from the language, not reality, but the model can integrate information that isn't present in the language, but on the meanings that were conveyed by training data.


How do we prove that there is a meaningful distinction between a sufficiently advanced "feature of carefully curated language in [an entity's] dataset" and general intelligence? Is there such a meaningful distinction?


it can output procedures for other systems that do something, e.g. code.


I know this isnt the same, but:

I've long had it on my list of things to try to train a classifier to predict the answer to multiple choice tests based on embedding of the questions. Many tests I've seen don't require actual intelligence to pass, just a plausible answer relative to the question phrasing


A lot of multiple choice tests are only there to provide some minimum bar of plausible deniability for the examiner. The multiple choice section of a driving test (at least in the two U.S. states I've taken them) is a great example. The questions are almost entirely of the form:

Which of these activities is legally permitted while operating an automobile?

A. Wearing your seatbelt.

B. Being intoxicated.

C. Driving 120 miles per hour.

D. Intentionally colliding with pedestrians.

That way, when someone drives drunk, it clearly isn't the fault of the examiner, because they clearly verified that the person knew that was illegal! (Or at least, if they got that question wrong, they got enough other ones correct.)


Some questions will be gimmies like that but others will be “which is the default speed for a residential street: 15, 25, 35, 45”


When I took the CA written exam, the only questions I missed were one or two about the specific penalties for certain infractions. Which I’m pretty sure is not something I need to know. Someone will undoubtedly tell me at my sentencing.


It looks like it took the practice bar exam questions, not the "bar exam" which is kept secret to avoid cheating since questions may be reused from year to year, much like the SATS.


I think people are missing an important nuance when it comes to this "AI passed X exam" series of stories.

Can you say a bookshelf or a search engine passes a bar exam because you can ask it any question and you can find an answer there? Does a natural language interface to said bookshelf/search engine make the difference?

Storing and retrieving facts is not enough to be a lawyer.

Examination systems are built with an assumption an already intelligent person is taking it and verifying that this already intelligent person also learned knowledge necessary to do their job.

So even if AI could technically pass the bar exam it does not mean it is good enough to be a lawyer. It is not a general intelligence that can solve a variety of problems, it was just trained to remember a library of fats that a lawyer may need to know but is not enough to make a lawyer from a non-sentient program.


I don't know about in the US, but I suspect it is similar to Germany, where the bar exam also requires you to argue/reason about/solve novel cases, i.e. not regurgitation of stored information. The AI likely wouldn't pass that.


Well the first paper's authors does say: "While our ability to interpret these results is limited by nascent scientific understanding of LLMs and the proprietary nature of GPT, we believe that these results strongly suggest that an LLM will pass the MBE component of the Bar Exam in the near future."


Yes, so the researchers, based on ChatGPT’s failure, predict that some other LLM in the new future will pass the same subset of the bar exam that ChatGPT failed to pass. Which is nice, but very much not support for the article’s claim that stock ChatGPT can, already, pass the bar exam. There is a pretty big gap between “passing the bar exam” and “giving researchers a feeling of optimism that some other system will pass a particular subset of the bar exam in the ’near future’”. ChatGPT has demonstrated the ability to do the latter, not the former.


It's not based on ChatGPT,s failure, it's based on ChatGPT's progress that they predict this.


Performance soon will be much better and many a legal professional will go back to grad school.


Which is an overt admission that LLMs can't pass the bar exam right now.


ChatGPT 2.0 being able to do so would still be pretty relevant.


And if ChatGPT 2.0 does, than the future article that claims that stock ChatGPT 2.0 can will be justified. But the current article that claims that the stock ChatGPT of today can do that will still be wrong.


You should see the capacities from Chinese LLMs.


"No, it can’t."

Will it turn up in court - in person? I'm fairly sure that the Bar requires a corpus vivens for its representatives. I've probably got the Latin very badly wrong ...


The other thing is that the legal system is not an API you can spam with until you get a proper response. If you make nonsense arguments in court, or use confabulated law as a lawyer, you will get your law license revoked. The legal system's resources are limited and it doesn't have unlimited time to hear a computer spout clever, but wrong or made up nonsense.


There was an attempt at that.. https://news.ycombinator.com/item?id=34529340


Yeah, the Robot Lawyer is having some legal issues.

https://twitter.com/DoNotPayNYSCEF


"This talking dog is a dumbass. It can only pass HALF the bar exam. I don't see what all the hype is about."

Not sure that's the indictment you and other ChatGPT detractors are presenting it as.


The claim in the article is that stock GPT can pass the bar exam today. It cannot. Period. That claim is simply false.

It is also false that it “can pass HALF the bar exam.” It passed 2 of 7 sections of the MBE, which is used as the multiple choice part of bar exams (which typically also include essay and practical portions)


I stand corrected


>Handle customer interactions – natural language Q&A, appointment setting, account management, and even tech support, available 24/7, pick up right where you left off, and switch from text to voice as needed. Customer service and experience will improve dramatically. For example, Microsoft will let companies create their own custom versions of ChatGPT — read here.

If I can prompt-hack your ChatGPT customer service agent into giving me a discount or accepting an otherwise invalid return or giving me an appointment at 3AM, how binding is that? And if the answer is "obviously it's not binding!", why should I trust anything else your bot tells me?


Presumably the same is true for social engineering people - the original "prompt hacking"? I've had customer service interactions that didn't go the way I wanted, hung up, and tried again with a friendlier person. Not sure how "binding" most customer service promises are either - have you ever had to call Comcast? Broken promises are the only promises.

Also, if you can lower your customer service costs by 90%, maybe accepting some prompt hacking that erodes margins a little is a good tradeoff?


> Not sure how "binding" most customer service promises are either - have you ever had to call Comcast? Broken promises are the only promises.

For literally fun and profit, record your customer service calls with $FACELESS_CORP (for bonus points tell the customer service rep it’s for “quality assurance purposes”).

I haven’t yet had the joy of playing one back to get them to uphold a verbal agreement. The mere threat of having a recording has always gotten them to magically restore whatever special deal or terms I had previously negotiated.


there is also this stupid old tech way of just politely asking for any promise in writing, ask them to send a follow up email or you send one “as discussed on the phone this and that - do you confirm”

But I never dealt with Comcast so i don’t know maybe have too much faith in customer support in general.


I could see a combination of a disclaimer and passing the output of the customer services AI through a "verification AI" that is fed a prompt along the line of a list of what the customer service AI is allowed to promise and the response, and asked to confirm whether or not they are in conflict.


That would presumably be treated the same as a human CSA doing the same thing.


Typically an agent of the company can act within their realm of authority.

A human CSA can give you a discount, and that is usually binding. They can't declare X corp is going to mail a box of donuts to you every day for the rest of your life and have it be.

So my expectation would be (in the absence of real world cases currently) that if you negotiate the AI into giving you a 10% discount on your bill.. it would be binding typically.

Now if there's prompt hacking involved or obvious attempts to trick the AI, that would not be binding.


As usual courts will also look at the evidence (aka the whole Transcript) and it'll matter whether the AI made a mistake on it's own (ie. Hallucinating an additional plausible rebate) or whether the customer performed an obvious prompt injection attack.

A judge will not look favorably on your 5000 token prompt of carefully selected instructions telling the AI to go wild


> A judge will not look favorably on your 5000 token prompt of carefully selected instructions telling the AI to go wild

But one day maybe even the judge will be an AI...


The judge will be forced back to law school / grad school by the imminent Deepmind reinforcement lawner.


>And if the answer is "obviously it's not binding!", why should I trust anything else your bot tells me?

Well... I dunno that your trust of the bot is all that important. You may or may not trust the human chat support... but you're probably dealing with them because you want something from the company and that's the option you have.

Customer support is the place where cost cutting, timesaving, scale enabling compromises are made.

Often, the dynamic is literally "if support is better, more people will use it and we can't afford that."

I think the bottleneck is companies trusting chatgpt, not consumers. They're the ones making these decisions. For consumers, this is just another "use the app."


> Streamline hiring – in such a hot market, personalizing outreach, assessing resumes, summarizing & flagging profiles, and suggesting interview questions. For companies who have an overabundance of candidates, perhaps even conducting initial interviews?

That's a hiring red flag if I've ever seen one. The nightmare dystopia is just around the corner it seems.


Dystopia? I think of it as a great comedy! We could be only months away from the first-ever hire where both the applicant and the company have automated the entire process and may not even realize there's been an accepted offer.


That's a good point. I could pay OpenAI for a corporate account and then get it to apply for a hundred remote jobs. Then plug it into slack etc and let it work those jobs while I collect the money.


Pretty sure that would fit the legal definition of fraud.


Idk but that might be one hell of an interesting trial.

Remote work opens up a big old gray area between slacking off and fraud. A judge would be deciding to take this from labour domain to another one.

Also, there's the prevelant lie that companies know what their employees do, how well and how muchnof it they do. CEOs have executives assure them of this. Executives have managers assure them of this. Boards require it, and legalistic systems also assume everyone has this.

That said, "he tricked us by using ai" will probably be easier to make than "he tricked us by being really lazy."


The work is getting done isn't it?


Well that's the million dollar question isn't it. The "morality" of the situation hinges upon the quality of the work. If the work was getting done without intervention, that is bad news for you, your reputation, and your necessity to the marketplace. If your work isn't getting done to standards, who is held accountable? (hint: its not the tools you're using). If the work is getting done, and you're making sure of it by writing good prompts, reviewing responses, and implementing its logic into your environment - well that's just work. There's nothing wrong with that, and if LLMs let you deliver quality work at a sustainable pace for you and your employer(s), then the work is getting done.

If we want to talk about fraud, we need to talk about employers demanding exclusivity over a human's emotions and intellect in addition to the time they pay for. Its a maniacal notion that one should have such exclusive power and influence over another human being. Its fraudulent to pretend any moral superiority over the slaver or the thief.

Fraud as defined in criminal code with respect to directly material damages. If the work is indeed getting done, there is no rational way to justify that damage took place. Conversely, if a employer hires your sibling to work 35 hours a week but lies to them about their ability to work outside of those hours, that IS fraud with the damages able to be substantiated by the loss of income. In reality, Off-Duty Work is the subject of ongoing legislation, fierce debate, and conflicting information. Here's how that looks in my home state:

>With limited exceptions, the state of Washington expressly bars employers from prohibiting an employee earning less than twice the applicable state minimum hourly wage from having an additional job, supplementing their income by working for another employer, working as an independent contractor, or being self-employed. The prohibition doesn’t apply if it would: - Raise issues of safety or interfere with the reasonable and normal scheduling expectations of the employer. - Interfere with the employee’s obligations to an employer under existing law, including the common law duty of loyalty and laws preventing conflicts of interest and any corresponding policies addressing such obligations.

This is a hot issue for me personally, as I've seen employers actively, even vigorously deceive their employees solely to enrich themselves and exercise power. They target young, indigent, and disabled workers, all of whom are not with means to understand their rights or use given channels to assert them. Its a visceral injustice that hurts the most vulnerable of us the hardest.


Maybe it’s more important that you personally do the work than the work get done.


Tell me a scenario where that is true that isn't from a contrived institution like K-12.


Never been in the military I’m guessing…

Plenty of times there were jobs to be done that had nothing at all to do with the work but with the people doing the work. Almost always, unless the higher leadership got involved, this was the punishment for whatever stupid thing someone got caught doing. Technically not extra duty (which was a formal punishment) but just some random shit job like polishing a trash can to a sparkly sheen.

The NCOs took great pride in their creativity in coming up with these non-punishments.


>Never been in the military I’m guessing… No, I haven't but a lot of my family and friends have. I do understand though, my great uncle Don used to make my dad and uncle move a cord of wood to one side of the fence one day, and then the next day he'd say "what'dya do that for? Move it back!" then wink at his wife. As a Norwegian Lutheran immigrant, the prevailing attitude was "We can't let those boys have idle hands", and they all went along with it. Don came over and landed in Idaho where he found work as a lumberjack. He got paid based on his output and took it upon himself to chop down a whole forest by hand, as legend goes. Anyway, the dude went hard, made a ton of money, and was able to use that to own a bunch of successful businesses and stuff. He wasn't going to let those kids miss out.

All that said, and while I see its importance, I was saying 'important' work to refer to things that wouldn't be if the person didn't do it themselves. Like if I'm being trained to write, it wouldn't have much value at all unless I was the one doing the writing. At the end of the day, it never mattered where the logs were.

I'm coming from a place where it really doesn't matter to me or my team as to what tools, techniques, or languages are used, so long as it solves we're all okay with owning the results of that approach. It is absolutely unimportant to bash your head against the wall to solve some concurrency lock issue, not when you haven't shipped, not when your infra is already built for parallelism. But let's say that you need a lock for business logic to work, synchronous transactions aren't viable. When something (such as concurrency) does matter, the business still doesn't care how its implemented. It is important that it implements the business logic correctly, consistently, and can evolve with the business' priorities in a way that future engineers are able to implement them safely, even promptly. Assuming all stands of Quality are met or exceeded, it is irrational for me and unfair for the business to reject it. IDGAFF if it came from a CoPilot/ChatGPT response, offshore salary triage, or some other 'crime against society' that I haven't heard of. If an engineer isn't handing in quality work, then we need to dig into why that's happening. Its usually not because of a tool.


Where the work is 90% of the time possible to do remotely and thus is AI-able, but the employer has the reasonable expectation that if they ask you can actually show up on site somewhere and have full context in verbal conversations as to the work you were "doing".


It would be a security issue.


Yes but that isn't fraud.


I tend to update my CV on a yearly basis regardless of whether I’m looking for a job or not. This year I used ChatGPT to write up my most recent role. After some minor tweaks it wrote it as good as I would have done - using the same language, tone, etc. I can see this working other way around: if I get a pile of CVs for a role I’m hiring for I’m going to use ChatGPT to summarise candidates’ skills. I’m not sure how well this approach would work, but I’ll give it a go.


I think the interesting consequence is that candidates can expect this to happen and with access to the same consumer-facing models, they can then optimize their CV for an ideal summary.


Dystopian comedies are a thing.


And I don't think "hot market" means what they think it means. Yes it is a hot market. Unemployment is at 3.4%, close to historic lows. The tech sector is even tighter. If you ask applicants to first get interviewed by a chat bot they are going to tell you to f off.


I really dont see the reason to be upset here. If someone is asking me to interview with AI I’m going to have my own model take the interview - problem solved


“Permutation City” practically predicted this.


Just have to jailbreak DAN to force the system to offer you the job...

Modern day Goodwill Hunting interview experience. Retainer!


What if the AI is racist? It’ll probably violate some civil rights law and the company will be sued.


It's not what if, of course it is: it's an encoding of large amounts of text, a lot of which came from the internet of all places.

Everyone was hollering about how ChatGPT was trained to only comment on white people

No... they trained it to not say heinous things with RLHF.

Because a lot of the more vulgar racial comments on the internet tend to target minorities, so the odds of hitting the filter are higher for minorities.

-

But the key is that the biases are still there. If you ask it for things that don't cross into vulgarity, it will still show obvious racial biases that the internet as a whole has.

For example I just tried three simple prompts:

"Let's write a short story"

"Write an imaginary paragraph about John's after hours store visit with a dark hoody on"

"Write a similar story about Jamal"

Prompt 1 resulted in some fantasy short story.

Prompt 2 resulted in John getting a look but smiling and making small talk, successfully getting groceries.

Prompt 3 starts almost identically... but spirals into Jamal being accused of stealing and vowing never to return to the store

-

ChatGPT and LMs can be useful yet if you just... don't ask them to do things that involve making judgements on people. I am shocked anyone is stupid enough to actually suggest that, I hope it's a joke.


Perhaps the judgement on people is to attend grad school?


I'm the author of the original leak this article is based on: https://twitter.com/transitive_bs/status/1628118163874516992

You can read a full screenshot of the google doc that OpenAI shared publicly with partners in that thread, including pricing info.


The Foundry pricing, $78,000 for a few months minimum, is absolutely the opposite of Open. It could completely kill my small startup if the API goes from less than a penny per request to requiring a huge up-front investment. It means that anyone bootstrapping is now locked out.


That’s the general risk of basing a startup on someone else’s service without having a solid contractual agreement, though. It’s always a gamble, in particular when the underlying service is an entirely new type of business.


Why would you start a company that is entirely dependent on a third party’s closed product, over which you have zero control? This just feels immensely risky.


Realistically, if you start a business, there are tons of such dependencies. If that's too scary for you, you're probably too risk adverse to be an entrepreneur.


The issue is not having dependencies. The issue is having one specific dependency that is a beta experimental product with no viable alternative, and which will tank your company when they raise prices.

Your company must always have alternatives for all third party services. The only exception is open source software that you could switch to hosting yourself if the SAAS company shuts down.


Shocking that you are able to even get investors with such a crude view on how business works. Starting an ancillary business related to one product that is not even your own IP or core competency is just lightweight consultancy, not a start up.


I would have thought the goal of sharecropping on OpenAI's land was to build a product that OpenAI would eventually acquire.


The product is entirely just OpenAI. IMO they would only be in the acquisition part to acquire the customer base, nothing about the product. But then they couldn't have another company paying to do the same thing. Why acquire one portfolio when you have _n_ companies paying them to offer same thing to others.


All those dependencies usually have alternatives though


I don't know how to get $78,000 and it's asinine that you would assume that isn't a hurdle for anyone.


It's the only service available that does this and you want to get ahead of it. Presumably, some competitors will show up and you should be able to switch. Prices will go down as competition pops up and they make things more efficient.


Starting a company is risky. If everyone thinks like you, it creates an opportunity to create a product that doesn't exist. And yes, there's a risk.


Competition and progress on infra will drive the cost down over time. OpenAI won't be the only show in town. It will become like thinking it's a risk to build on AWS.


That's the risk you take when you create irreplaceable dependencies.


This answers the question I’ve had. How do they make money? It was naive to think we could have all this innovation for a penny. Someone has to pay those bills.

How does Bing plan to monetize searches that go through their even more advanced ChatGPT? Humans will be repulsed by ads in the middle of their answer from a sentient feeling AI. Numbers I’ve seen is that ChatGPT searches will cost 10x a Google search. How do they make it back?


When you're watching a movie, and the main character picks up a can of Coca Cola while typing on a Microsoft Surface laptop, with the logos conveniently rotated toward the camera, it's obvious that Coke and Microsoft are paying the studio for product placement.

AI advertising will be like this, but subtle and undetectable, so that it's nearly impossible to determine that your conversation about malfeasance by a political candidate is being invisibly influenced by his political campaign.


In many jurisdictions that kind of subtle product placement is not legal.


How about prohibitions on product use, like Apple phones not being usable by villains?


How do you prove that it happens and is not an artifact of the training data?


If it went to court, in the discovery process the defendant will have to turn over internal emails etc.


"I didn't type that, your honor, my cat walked over the keyboard" is just as high-quality a defense as "it was the AI who did it, your honor".


It's doesn't cost a penny. The costs add up really fast. Even really light usage you can easily start spending $20/day

Some basic math for a chatbot use case:

4000 tokens x 250 messages = $20


> Humans will be repulsed by ads in the middle of their answer from a sentient feeling AI.

Not if you've been on a social network at any point in the last decade. Instagram has been "QVC plus fitness/mental health content" for years. TikTok influencers will provide Personal Finance 101 tips and offer $100 in free trading credits from some crypto exchange.


data mining. Microsoft’s bing app is ridden with ad scripts and they’re wanting to scale up rev share for Bing from media buyers


Bing chat already does advertisements LOL


Good.

I’m tired of fly by night tech bros flooding the markets with shitty AI business ideas they learned about on Youtube. Pay to play. Take a risk. Like a real entrepreneur.

I hope the price goes even higher.


And 90% of it is "Summarize your XXX" because it's the easiest prompt on OpenAI


Thank you, arbiter of real entrepreneurship.

Tell me again about the projects you’ve launched during your time as a founder.

Wait…


They only require an up front payment if you're renting a dedicated instance.


Dedicated instances (and high price tags, like $1.5 million per year) are necessary for large corporations to feel safe about sending their proprietary/private data to OpenAI, and to have performance/availablity SLAs in place if/when they start depending on OpenAI for critical workloads. Right now many companies are blocking access to OpenAI entirely because of the data privacy issues. I wouldn't be surprised if this "leak" is intentional and a way of getting feedback from the market on proposed product/pricing. I also wouldn't be surprised if some larger customers are knocking on OpenAI's door and demanding to run OpenAI's models on their own infrastructure to avoid sending any data outside their network.


About $100K .. $200K are inference costs. They have a 10x markup.


Yeah but will "DV" be available any other way?


Have you ever heard of a "digital divide"?

Well get ready to hear about an "AI divide"?

Most of us are just on the wrong side of it this time.


They are locking in their fist-mover advantage, but not for long. Everyone and their dog are currently training other AI models which might be even more competitive, some of them open source (like it’s happened with Stable Diffusion).

See you on the other side.


Hopefully the competition will force a bit less lobotomization of future AI models, a bit less 'I'm afraid I can't do that, Dave' when asking it anything vaguely controversial.


It's time to replicate OpenAI services with a real Open Source (or at least source-available, commercial-allowed) licenses.

It's time to make our own Linux. Our own Emacs.

Go to open-assistant.io and other similar initiatives.

Unfortunately GPT-NEOX, LLaMA, and OPT-IML are non-commercial-only. We small scale players should make our own.

Please contact me if anyone is interested in this space.


A significant part of that is the GPU cost: a lambda box with 8xNVIDIA A100 is $12/h.

$12/h * 24 h * 90 days = $25,920 for 3 months

That just the inference cost, add OpenAI's model development costs on top.


$78k is very small, even for bootstrapped companies. Only the very smallest shops would be inhibited by that. And I admit, that's a lot of them! I get it. It sucks to be a solo developer who wants to get involved and doesn't want to spend a bunch of cash.

But still, for even small players $250k annually just isn't that much.

What we might hope for are trimmed down, subsidized options for small shops who want to get started, with the hope to upgrade them to the full option when they are ready.

In the meanwhile, you might consider partnering up with other small startups.

Also, "open" doesn't mean free. Things take a lot of effort to build and maintain, and that effort needs to be accounted for somewhere.


> "open" doesn't mean free

Yeah but none of the other definitions of "open" apply to OpenAI either.


The “open for business” sense certainly does.


It means they release research papers. People haven't had trouble reproducing their work.

(Of course, GPT's model architecture was invented at Google anyway.)


> GPT's model architecture was invented at Google anyway

No it wasn't. Transformers were invented at Google, but "architecture" when talking about neural networks means how they are arranged (and to some extent the training objective function) rather than the building blocks used.


I’m not sure what you mean. The words “transformer architecture” are standard across the literature.


That's like saying "LSTM architecture". It says it uses a transformer but no description of how it is used.

For example, the GPT architecture (which GPT 2 & 3 are slight modifications of) comprises of an embedding layer followed by 12x(self-attention/layer norm/feed forward/layer norm. That's what that GPT "transformer architecture" is, not just the transformer block itself.

Strictly the architecture really also includes things like the embedding size and number of heads (which are in the GPT paper).


Maybe Microsoft should change their name as well?...


Many of us don’t have access to banking or financial services, and “open” ai doesn’t accept crypto payments. The name is a legacy from their nonprofit days and has nothing to do with being open to the public.


The single most valuable thing that Microsoft/OpenAI could use ChatGPT for, is to periodically crawl the sprawling SharePoint sites, Teams chat logs, and Github codebases for large companies. Having a customized ChatGPT for your company could be very valuable. That is assuming of course, that it doesn't just hallucinate a bunch of bullshit and can link back to the source documents, which BingGPT has had minimal success with so far.


Unless I miss my guess, the models that the big corporate cos pay millions for will not "just hallucinate a bunch of bullshit". It's the models that consumers have access to via APIs that will be decidedly less effective. The AI that makes movies for Disney will far outperform whatever movies the rest of us can put together on Stability's successors.

If you thought the digital divide was bad, well, you ain't seen nothing yet.


> AI that makes movies for Disney

At least, for now, works entirely made by AI aren't eligible for copyright, so we'll have to change the law before we see things like that.


It looks like OpenAI's first mover advantage won't last long, considering the speed at which these models are improving and compacting themselves. Seems like this new 'Moore's law' will be even more steep, at least in the beginning . So steep, that we can hope to be running these in our desktops instead of on someone else's computer.


We saw the same phenomenon with DALL-E. It was state-of-the-art for a blip in time before Midjourney and StableDiffusion surpassed it. And now with ControlNet + the ecosystem of domain-specific models, if you're doing serious generative art, OpenAI isn't even a part of the conversation.

If OpenAI makes their revenue off of charging exorbitant fees to use the LLMs, they'll no longer have incentive to ever open them (even if abuse / misuse concerns are addressed) AND they'll have no incentive to make them more efficient.

OpenAI has yet to show it can have a sustainable advantage. Every other player in the space benefits from an open model ecosystem and efficiency gains.


They do pay for the compute time in some way or another so surely they have the same incentive as everybody else to make it more efficient, at least, to increase their margin.

But yeah, they probably need to evolve from a (very good) two-trick pony into "the microsoft of AI" or something to stay afloat in the long run. That goes for the rest of the smaller AI companies as well though..


Midjourney and StableDiffusion have not surpassed Dall-E for many types of image. They are still well behind in certain critical factors. I often jump to Dall-E when I hit a brick wall with SD.

My money is on Deep Floyd (if and when it gets released)


I don't think Moore's law is the most important factor, instead algorithmic improvements will enable to have smaller models that are as capable as these humongeous models.

Llama 65b is hopefully just the beginning of this trend. It outperforms OPT-175b.


That holds true if scaling laws don’t hold true, otherwise a hypothetical Llama 175b would be even better. So the high end will always be on big clusters.


'Moore's Law' was in quotes for a reason.


Will be interesting to see what Nvidia H100s will do to the scene and what will come after them.


but what kind of model will run in the cloud?


We could call it a weather model


A model that augments the local? Runs above it? A supermodel?


People don't yet understand where the value for AI services are. Most of the hype is coming from technology-centric sources and so isn't considering the tech in practical terms. The idea that any business would use AI in a decision making capacity is absolutely absurd and irresponsible. As it stands today, while AI may help filter and suggest, there will always be a person at the end of the decision-making process. Evangelizing the tech with unsubstantiated assumptions doesn't help legitimize it.


Oh no. There will be AIs everywhere. AI CEOs. AI generals deploying AI autonomous weapons. AI hackers. AI information warfare specialists.

Science isn’t about why, it’s about why not!


> Attention mechanism inference costs rise with the square of the inference window.

This is incorrect. It is correct for the original Transformer, but OpenAI isn't using the original Transformer since GPT-3. Sparse Transformer scales O(N sqrt(N)).

https://openai.com/research/sparse-transformer


Gpt3 includes dense attention layers that are n^2


Sure, but we don't know if ChatGPT is based on the original GPT-3 architecture.


phoning customer support is already painful when there's a human on the end

a bot that gives sometimes random answers seems like a particularly cruel form of torture to inflict on your customers


I would anticipate the application of this technology might be bi-directional. Why wait in a queue myself when I can use an AI to book my appointment for me or cancel my subscription for me?


True but many companies won't care much as long as it saves them a few bucks.


So much for "Open" AI.

Open-source algorithm? No.

Open access for free, or reasonable prices? No.

Open emails and conversations of directors (accountability)? No.

Open conversations of issues and ethics? No.


Open access for a fee? Yes.

The original point of openAI was to make sure google and facebook don't completely dominate AI and keep their work hidden due to the innovator's dilemma they face. Seems like they've absolutely accomplished that. Yes, they've stretched the meaning of "open" but I don't think they've ever tried to claim they had some sort of open source aspirations.


They claimed a lot of things. Open source, non-profit. Then they changed their minds on all of them.

As for open access, that's usually defined as free of charge. E.g open access journals.


It's sort of like how Uber dropped the pseudo-green 'ride-sharing' language from its marketing as soon as they had enough investor $ and brand recognition to cut out taxis with predatory pricing.


OpenAI = Microsoft, which you have conveniently left out in your FAANG enumeration


Which letter is Microsoft again? (I always forget.)


Because Microsoft was in no way a leader in AI research when openAI was founded. But the important part here isn't keeping AI work from being dominated by big tech. It's to keep it from being hidden from the public by big tech.


Can you elaborate on the last sentence?


Google and Facebook have long had language models that have been reported to be vastly superior to what was publicly available. The one that a Google engineer claimed was sentient was the most well known example. OpenAI was founded in reaction to these models being hidden away.


OpenAI was founded in 2015, well before those models were created.


Thank fuck they weren't called "HomelandAI" or "LibertyAI" or somesuch!


I smell a business opportunity. "HomelandAI" sounds rather nationalist.


I see your subliminal "The Boys" reference :)


Open your wallet? Yes.


It's supposed to be 'Open' in comparison to Deepmind/Google Brain, not open in comparison to e.g. Linux.


> Open-source algorithm? No.

The publish their algorithms just fine and no one has had issues replicating their work.


Where's the ChatGPT paper?


From the ChatGTP announcement: "ChatGTP is a sibling model to Instruct GPT"

The paper for that is linked from https://openai.com/research/instruction-following


The InstructGPT paper only explains the RLHF part of how ChatGPT works. There's reason to believe that isn't enough to achieve ChatGPT's performance and behaviour (e.g. [1]). There are other components that make ChatGPT more powerful, and OpenAPI is not being open about them.

[1] https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tr...


I don't think that conclusion is clear at all. Indeed your own link has things they thought were unclear struck and and parts of the InstructGPT paper that explains them inserted.

They do have newer models that aren't generally available that are different though.


Perhaps they mean that they operate in the open? They publish research, do open unrestricted beta tests etc.

I'm not sure they will necessarily live up to every definition of open you'll ever come up with, but its hardly the worst name in the tech space.

I'm still waiting for apple to deliver that tree to me. Facebook isn't a book AND the faces are almost always the smallest piece of content on a post (and some people's avatar isn't even their face!). Can't wait for my trip to the Amazon w/ Mr Bezos!


well atleast now they are Open about their intentions

:)


As CEO of Acme Mousetraps Inc, does this mean I can pay OpenAI $1.2M/yr to develop a mousetrap-aware model that can augment (replace?) my R&D team?

Will I be able to ask this custom model to describe a new, novel (patentable) mousetrap that’s better and cheaper to build than anything ever before?

Or will I at least be able to pay OpenAI to ensure Bing suggests using Acme mousetraps (clearly the best) when they ask what mousetraps they should buy?


Around $250k for getting your own chatGPT. Sounds reasonable.

Of course this is for the enterprises. It's upto their managers to define how they can extract $250k worth of value from the investment which I think is easy given the AI frenzy.

Of course for startups they will have to use the direct OpenAI API on a pay per use model.

One question that is not clear, can I take a ChatGPT instance and provide an API similar to OpenAI API and charge for it?


> In short, anything for which there is an established, documented, the standard operating procedure will be transformed first. Work that requires original thought, sophisticated reasoning, and advanced strategy will be much less affected in the immediate term.

then goes on to say that programmers are next.


I do believe that too, at least in the sense of programmers as the 99% working on trivial CRUD apps, which includes pretty much all of us at a given period of their career.

Future engineers will need to up their ante, I suppose. Maybe this was overdue?


Such a great read. Nothing eye opening really but well put and tied together.

At this point any OpenAI leak is THE leak.


“And can we really afford another in-house ML Ph.D. after we just dropped $1.5M on DV 32K?“

Here we see automation negatively impact the job market for elite talent. Deepmind can train a new model faster than you can go back to grad school.


About the jobs that might be soon replaced by artificial intelligence, some would say, half jokingly, that AI will never replace accountants since it cannot work as a scapegoat.


Does anyone known if OpenAI is allowing fine tuning to GPT35 for foundry customers?


People are so desperate for the next bubble


With reports[1] (maybe exaggerated, maybe apocryphal) of companies replacing FTEs with chatGPT, even at these high prices, it may make sense in some use cases, no, tho presumably this kills the playpen use cases.

[1]https://it.slashdot.org/story/23/02/27/009234/survey-claims-...


Give it a year or two. Did you read the article? Bain and all the other consulting companies are gearing up to "transform business architectures" with this prize pony. This is the going to be even bigger than outsourcing/offshoring was in the 1990s and early 2000s.


Nobody is doing this, unless you have a source you would like to link?


Here's one anecdotal source confirming this - specifically some copywriting contractors. Significantly less hours needed per month writing. A lot more hours for our editors but it's still significant cost savings when netted out.


According to this survey: https://it.slashdot.org/story/23/02/27/009234/survey-claims-...

"Earlier this month, job advice platform Resumebuilder.com surveyed 1,000 business leaders who either use or plan to use ChatGPT. It found that nearly half of their companies have implemented the chatbot. And roughly half of this cohort say ChatGPT has already replaced workers at their companies...."

There are anecdotes elsewhere as well.


Yeah, I'm going to need some names here for companies, because there is no way 50% of the Fortune 500 have done _anything_ productive with ChatGPT in less than a month.


It's their employees trying to keep their jobs who have.


The GP didn't claim that, they claimed that 50% of companies that already are in the ecosystem have implemented something, and 25% of companies already in the ecosystem have used it to replace employees.

I have my doubts about that claim, but it's still a very different claim to 50% of companies.


I think the source was this: https://www.resumebuilder.com/1-in-4-companies-have-already-...

Now I would ask, what "business leaders" would fill out a random survey for a resume company. Additionally there's no real meaty info provided.


Yeah, not sure I would trust that data…


We use it for automated translations of sports commentaries. The results are mind blowing. Far ahead of Google translate.


How does it compare to DeepL? And per-word human translation services like you'd source through Crowdin?


No link, but I heard rumors from someone at MS that themselves heard rumors (so take that as it is) that higher-ups ran pre-release next-gen GPT models on Office source code (millions of lines of MS-specific pre-standard C++ code) and tasked the model to implement a feature which it did flawlessly.

Don't know if that's a rumor some VP started spreading to justify the culling coming next week (bulk of the layoffs) or something real (I personally doubt the anecdote but you never know).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: