Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People are gushing over its ability to write code, forgetting that code is just another type of language that can be used to express ideas.

Logic however, transcends language. And it is clear that GPT3 has absolutely no understand of basic logic. It gives the impression of understanding logic by constructing sentences which are sometimes logically coherent.

We are a very long way from AGI.



This test shows it probably displays theory of computation better than 80% of the population. The wonder is, it’s just text, so this is an emergent property.

Let’s say it can simulate theory of computation better than 99% of the population and can very capably synthesize and infer from any text based sources. I think that would shake the world, and it wouldn’t even need to be near AGI.


I really depends on how much you care about undefined behaviour. If you're just playing around with a prototype or making a personal tool, corrupt state doesn't have huge ramifications, but for any software people would pay for (be that with money or attention), there's an expectation of a minimum level of robustness, security, performance, and a variety of other business requirements.

To achieve the same with an AI that doesn't have a real understanding of the business logic, programmers would still be needed to write the test suite. But unlike most test suites that are typically underspecified, the test suite would need to be likely more complicated than the program itself. You could use ChatGPT to expedite writing of the test programs, but attention would still be required to actually verify the tests themselves.


The question is would the profession go from writing code to auditing gpt code


Why not? The profession went from writing machine code by hand to auditing "auto-generated" machine code from assembly. And so on with every generation of programming languages. With every new higher-level language with new abstractions, we're often just adding another chain to the inputs to a machine code translator.

Edit: If you showed a programmer from the 1950s python syntax and told them that all you have to do is write these words to build a program, they'd think it was artificial intelligence.


  > Why not? The profession went from writing machine code by
  hand to auditing "auto-generated" machine code from
  assembly.
A compiler determistically creates assembly through a set of rules not through AI. You absolutely cannot compare these two in that way.

  > Edit: If you showed a programmer from the 1950s python
  syntax and told them that all you have to do is write these
  words to build a program, they'd think it was artificial
  intelligence.
I guarantee you that your bias is a complete misunderstanding. People today are not more intelligent compared to people then. People are people, and programmers in the 1950s had to be extremely clever.


Especially that our field foundations wasn’t shaken to the core in the last 70 years, when around the century there was a huge change in understanding.. everything. Gödel, Turing machines, Chomsky, Halting theorem, etc. That we no longer has to fit into 3 megabytes and thus can write some monster abstractions is not fundamentally different at all.


> Edit: If you showed a programmer from the 1950s python syntax and told them that all you have to do is write these words to build a program, they'd think it was artificial intelligence.

I think you might be underestimating programmers in the 1950s.

> The first practical compiler was written by Corrado Böhm in 1951 for his PhD thesis

https://en.wikipedia.org/wiki/History_of_compiler_constructi...


I don't think the GP is underestimating programmers in the 1950s but rather commenting on their definition of AI. Back then first order logic was considered AI to a lot of people. I feel like in 50 years, AI models will become so complex and sophisticated that they will look back at us surprised that we ever considered ChatGPT to be an AI since it fails at basic logic tasks like the event ordering example above.


> Edit: If you showed a programmer from the 1950s python syntax and told them that all you have to do is write these words to build a program, they'd think it was artificial intelligence.

Well, if you showed a programmer from late 1960s / early 1970s that same thing, they'd show you Lisp and laugh you off, or despair that programming seemingly regressed in the future.


>Edit: If you showed a programmer from the 1950s python syntax and told them that all you have to do is write these words to build a program, they'd think it was artificial intelligence.

he's a computer programmer, not a teutonnic knight.


Honestly the only use case I see for this is to write more tests because testing is very time and code intensive which means most tests are just simple canaries and not capable of actually verifying the software because nobody is going to write 10000 lines of test code to test a 500 line file.


No human maybe, but could an AI?

I fed the prompt for https://leetcode.com/problems/two-sum/ to ChatGPT, which generated code for an answer, and I then asked it for additional test cases, which it gave. What's mind blowing though is that it "reasoned" that:

> These test cases demonstrate that the twoSum() function can handle cases where the input array contains negative numbers, where the target is negative, where the indices of the two elements are not adjacent, and where the input array contains duplicate elements.

Full transcript: https://pastebin.com/BpdcuWEM


The problem is that according to the post at the root of this thread the AI would have written a test to verify that A < B < C whereas it should have written a test to verify that A < C < B. If the AI or a human had also written code to match A < C < B then nobody might ever know until the day that condition comes up when the system is running.

I can imagine that it would be stressful to review code like this. Where you know you cannot trust a single line of it but that it looks consistent. You might have to verify back to the original assumptions every time and and make sure that you have not been led astray by the AI.


Most certainly. It's not if, but when. The same for our eventual place of perfect trust in artificial intelligence. For now, ChatGPT, Co-pilot, and the like are at best productivity tools for boilerplate.


Sounds harder than writing your own code sometimes. Reader other people's (or machines') code is not easy at all.


> Let’s say it can simulate theory of computation better than 99% of the population and can very capably synthesize and infer from any text based sources. I think that would shake the world, and it wouldn’t even need to be near AGI.

The total percentage isn't exactly what matters. Emergent properties as a metric is a smokescreen.

If that last 1% incorrectly demonstrates that A<C<B implies A<B<C, that means the system is not reliable enough to perform logical computations. You'd need to have a person oversee 100% to catch the last 1% of serious but basic errors. In such a case you might as well hire the person directly for the job.


May be it's because a lot of code out there doesn't actually require logic but merely requires copying what is already on the internet? I mean, I don't need to tell you that.

A computer can sort an array of numbers faster than likely 99% of the population, it doesn't mean it's useful.


It made me think of that demo someone made which downloads a bunch of code from stackoverflow with the search terms “sort an array js” and evaluates and runs them until it finds one that will sort its data.

https://gkoberger.github.io/stacksort/

ChatGPT’s program output seems to be basically the smarter version of this, but it ain’t gonna scale to anything truly novel.


Professional developers make up less than 1% of the global population (around 25 million people apparently). Assuming they are the ones that have a good theory of computation, I think “better than 99%” is not a high enough bar to shake the world, right?

I mean, it depends on what we expect the AI to do. Maybe it would be revolutionary to just have, like, an average programmer with a ton of free time (so, the AI only has to beat like 99.7% of humanity to do that). On the other hand, if we want it to change the world by being much better than the average person, I guess we’d need a couple more 9’s.


The AP test is about potential, though. You don't hire a high schooler to work as a developer based on their AP CS score.


> We are a very long way from AGI.

I don't think so, the scaling laws haven't failed so far. I fully expect that making the model bigger and training it on more data will make it better at logic.

For a nice example with image models, Scott Alexander made a bet that newer image models would be able to do the things that Dall-E 2 gets wrong. [1] (This post also discusses how GPT-3 could do many things that GPT-2 got wrong.) He won the bet three months later through Imagen access. [2]

[1]: https://astralcodexten.substack.com/p/my-bet-ai-size-solves-... [2]: https://astralcodexten.substack.com/p/i-won-my-three-year-ai...


I don’t know, isn’t the safer bet is that scaling will eventually reach a dead end? I honestly fail to see how a language model could “execute” a sequence of reasoning steps, as it doesn’t think in a symbolic way. Do correct me if I’m wrong but it would require a complex rearchitecture, so not sure we are any closer, we just have a very impressive, smart search engine now.


It’s not just a safe bet but almost guaranteed. Humans combine their internal language models with physical intuition and experimentation from the moment they are born. There is zero chance that an AI can understand the physical world without access to it [1]. Until it has that access, it’s no more than a glorified context specific Markov chain generator

[1] Henceforth called Kiselev’s conjecture, a corollary of Moravec’s paradox: https://en.m.wikipedia.org/wiki/Moravec's_paradox


It's possible for models to learn a lot about everyday physics from videos.


No, it isn't. Not yet.


You said "There is zero chance that an AI can understand the physical world without access to it," which is wrong. It is possible. Using videos is an active research area, e.g. https://proceedings.neurips.cc/paper/2021/hash/07845cd9aefa6... or https://arxiv.org/abs/2205.01314


Thank you for the links, it's fascinating!

Fact is, without a feedback loop that can run physical experiments like infants do from the moment they're born, I highly doubt they will develop a useful intuition using just video. Hence the conjecture


For text data, we probably don't have more than one more order of magnitude of data left.


> We are a very long way from AGI.

In fact it has just gotten closer.

Logic reasoning is a pretty solid branch of AI since it’s inception. Robust solutions exist for most problems; even a programming language based on its principles (Prolog).

With ChatGPT there is now a system that can express the results from automatic logic reasoning in language.

The next step would be to combine the two, i.e. tell chatGPT to explain the result of a logic reasoning program in natural language. It could of course also be asked to translate a natural language query into Prolog code.

This will probably require retraining the model, but I guess the demo we are given by OpenAI leaves little doubt that this is perfectly doable.

ChatGPT has the potential to plug the gap between GOFAI and natural language, which is quite a feat.


ChatGPT is damned impressive, but I hesitate to apply the word "reasoning" to its output. It's also not clear that the backend can be generalized to be able to plug the input into a symbolic manipulation program, or to gather the output and explain it. Only OpenAI knows for certain right now. Powerful if possible though!

Here's why I can't call it "reasoning":

> what is next in the series: a,b,c

> Without additional context, it is impossible to determine what the next element in the series "a,b,c" would be. The series could potentially continue with "d," "e," or any other letter of the alphabet, depending on the pattern or rule being followed. It is also possible that the series does not follow any pattern and is simply a random or arbitrary sequence of letters. Without more information, it is impossible to determine what the next element in the series would be.

> why isn't it d?

> As mentioned in my previous response, without additional context or information about the pattern or rule being followed, it is impossible to determine what the next element in the series "a,b,c" would be. It is not necessarily true that the next element in the series would be "d," as the series could potentially continue with any other letter of the alphabet or follow a different pattern altogether. Without more information, it is impossible to determine what the next element in the series would be.


Sounds like it is trained to avoid answering questions, and instead tries to bait you to give it more information so that it can tell you what it told it.

It answers some things directly, but so many things are just avoidance. And then people say "see, it understood after a bit of back and forth, it is smart!", even though it is basically iterating through Google responses and reformulate those to fit the conversation until you say it got it right. Google v1 used pure logic and got you waht you wanted, Google v2 tries to do natural language and sometimes misses, ChatGPT is the next step and tries to do full language but misses most of the time.


So one should use chatGPT as a frontend to Google v1!


My point is, there are frameworks and languages for reasoning which are mature. But they require formalised input (e.g. code) and deliver formalised output.

As a language model, chatGPT can translate back and forth between natural language and those formal languages. Part of that ability is evident in the numerous examples that demonstrate how it writes or explains code.

The version of ChatGPT that is public is quite restricted; it can‘t browse the web or run code.

A version of chatGPT that can translate a logic query I to Prolog, run the code, and translate the result back to the user should be perfectly capable of logic reasoning.


> The next step would be to combine the two, i.e. tell chatGPT to explain the result of a logic reasoning program in natural language. It could of course also be asked to translate a natural language query into Prolog code.

From what I remember, the very initial prototype of AlphaGo just had a neural net trained on historical games; effectively saying, "what kind of move would a traditional grandmaster make here?" with no planning whatsoever. This was good enough to beat the person who wrote the prototype (who wasn't a master but wasn't a complete novice either); and to make it able to defeat grandmasters, they added Markov chains for planning (which also necessitated a separate neural net for evaluating board positions).

It sounds similar to your suggestion: A model which simply generates realistic-looking sentences is accurate maybe 85% of the time; to make it truly human (or super-human), it needs to be paired with some sort of formal structure -- the analog of the Markov chain. The difficulty being, of course, that the world and its knowledge isn't as simple to represent as a go board.

That said, making coding answers more reliable, by adding a logical structure explicitly designed to support search & testing, should be within reach.


> The difficulty being, of course, that the world and its knowledge isn't as simple to represent as a go board.

Humans suffer from the exact same limitation. The limit to correct inference and prediction is often the amount and quality of input data.

A language model that can extract information from text and interact with the user to refine and clarify that information could be tremendously useful for experts who understand how the model works.

Without that understanding it will be rather disappointing though, as we see with some of the reactions to chatGPT and also Galactica (RIP).


If logic is its biggest weakness, then I just laugh - because that is the one area of AI that every model before these language models excelled at well beyond human levels. All it takes is GPT to formulate the english sentence into logic predicate statements and throw it through a "3rd party" script that does the heavy logic validation/proving and you're good. Those are well-treaded areas of programming, and were ironically where people expected AIs to come from and be strongest in - nobody expected exceptional painting and conversing skill just from averaging out a shit-ton of data.


If the language part doesn’t have reasoning capabilities how can it interpret the results? This wouldn’t make AGI, otherwise they would have already plugged it in.


If you want some more things that ChatGPT isn't good at, try to get it to form novel anagrams, palindromes, or other such wordplay. It's good at regurgitating textbook examples of those, but I found you can trip it up by asking it to do things like "Create a palindrome containing the word 'coffee'"


I spent a long time trying to get it to correctly return me the same program it had done before, but with all the identifiers reversed.

It swore up and down that it had done so correctly but every identifier was complete gibberish.


It certainly has logic. I had some fun using the "virtual machine" example someone else did, with the "nvidia-smi" command, if I told him it was hot in the room, the next run of the command was showing an higher temperature on the GPU. This is the logical conclusion from an hotter room.


> It certainly has logic. I had some fun using the "virtual machine" example someone else did, with the "nvidia-smi" command, if I told him it was hot in the room, the next run of the command was showing an higher temperature on the GPU. This is the logical conclusion from an hotter room.

Orrrr.... it's drawing references from other texts that were colocated with the word "hot"


That sound to me like inference.


It's an inference based on how chatgpt works, which is a more reasonable inference than assuming chatgpt somehow has abstract logical reasoning capabilities.


It doesn't have any logic, it's just prediction based on statistics. There is so many examples already floating around that it has no logic but I will give you really simple one from my experiments:

I told it to:

> curl somedomain.ext

It replied with curl error that this hostname doesn't exists.

Then I told it to:

> curl https://somedomain.ext

And it replied with some random http response showing that this hostname exists.

There is no logic here.


> And it replied with some random http response showing that this hostname exists.

And that's not logical? ChatGPT doesn't know what is there, so it answer logically based on what should happens there. Obviously having 2 different answers make it less logical for sure, but I have seen many peoples makes plenty of logic error too in real life.

It's crazy to me that for an AGI to be one, it need to be infallible in logic...


If it was an AGI it would have told you it doesn't have internet access.


You see logic, I see a bunch of Google results for people being told to run nvidia-smi when reporting high temperatures.


What about doing wget over https://chat.openai.com/chat ? I don't believe it had much Google result over that when it learned, yet it was able to logically infer it would be a chat assistant doing exactly what he was doing.


Experienced this myself trying to get it to write a poem with an unusual rhyme scheme. ChatGPT's response when I tried to explicitly spell it out (a few different ways):

Yes, you are correct. The statement I made is not accurate. The specified rhyme scheme was AABA BBAB AABA BBAB, which means that each stanza should have the rhyme scheme AABA BBAB, with the first and second lines rhyming with each other, and the third and fourth lines rhyming with each other.

That said, I wouldn't be surprised if the next version was able to grok this.


This is another perfect example of how the only thing that GPT-3 has "groked", is the construction of grammatically correct sentences based on probability.

But this paragraph it has produced has a logical contradiction and demonstrates that it has no understanding of what its writing about.

If anything, ChatGPT has demonstrated how little of the code we write requires logical thinking, and is just CRUD and API constructions or arcane library use. A perfect use case for a machine that can construct correct grammar from millions of examples.


> absolutely no understand of basic logic

> We are a very long way from AGI.

Let's not forget that computers are insanely good at bitwise computations. It's just a matter of time before someone adds a Coq/Lean style reinforcement to AI's learning capabilities.


Computers are not that good with symbolic reasoning. Even state of the art symbolic math software can easily be given problems it will have no idea what to do with.


> given problems it will have no idea what to do with.

Yeah but the question was mostly given by Math PhDs, no?

I don't think problems given by an average American would be that difficult for AGI to solve.


I don’t know, we are fickle, biased creatures with well known failures at reasoning, but let’s not lessen our intelligence too much. Even small children can outsmart the best of our existing AIs.


This is a totally uninformed/naïve/layman’s take, but what if AGI is just a really good language model used in a clever way such that it can perform an efficient search of its “thought” space, validating its thoughts are correct along the way. Programming, logic, math, etc are perhaps the easiest forms of “thoughts” for a computer to validate, but given enough quality data maybe it could be good at all kinds of other tasks as well.


People not only search a thought space, we have internal reasoning that does symbolic manipulations and sequential steps to get to a conclusion. Like, you would have no problem following a simple rule given to you ad infinity, while these models simply can’t take into account all that.


Computers can reason and do symbolic manipulation too (e.x. https://en.wikipedia.org/wiki/Computer_algebra).

There’s no reason to believe AGI will be composed of only language models. I suspect it will be a combination of multiple models and a lot of other software glued together in clever ways.


Yeah I know, but we are not that much closer to it contrary to the hype ChatGPT gets (which is well-deserved, but “we will loose our jobs” threads are not based in reality)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: