The point is to try and see if LLM's wide general knowledge can have advantage i...

threeseed · on June 9, 2024

Actions typically consist of a series of small steps.

Given that LLMs are inaccurate around 5-10% of the time each step will compound the error rate until you are better off flipping a coin.

mewpmewp2 · on June 9, 2024

I don't understand this stretch logic. It absolutely depends on the type of problem where they are inaccurate, how well trained they are in it, there is no way you can extrapolate like this.

You can ask them to do math equation which takes steps and if they are trained in that for certain problems they are accurate near 100 percent of the time.

Like ask gpt-4o to solve different variations of

"""What is the answer to 2x + 7 = 31?"""

If the numbers are of similar magnitude and simplicity, it will follow the same steps and be right 99%+ times, and I'm only not saying 100%, because I haven't tried it enough, but I don't see it being wrong.

For example """What is the answer to 2x + 4 = -6?"""

Just run a test yourself. Do random integers within 0 - 20, it will definitely not be incorrect 5% - 10% time. It will be correct 99%+ time.

Where is this number 5% - 10% even coming from? You could also keep asking it "What is the capital of France?" and it's going to be right 99%+ of the time.

threeseed · on June 9, 2024

You are conflating asking a single question to ChatGPT versus AI agents which typically need to interact with an LLM multiple times.

And the 5-10% is on average and gets significantly worse as you expand the context length which is also something you want for an agent.

mewpmewp2 · on June 9, 2024

It depends on the problem right. It would have 0 accuracy one some problems and near 100 percent on others.

Based on what you are attempting to do you could get any average in the end.