I had to do me expenses this morning and one task is to split out the bill into Room, Taxes and Dinning. I asked gpt-4o to handle and it was a few dollars short. I then asked it to check and be careful and it produced A/B alternative response where A was actually correct when it figured out it's calculation was short of the provided total in the bill.
It could not be trusted to get it right first time and hence I had to do the calcs as well as check it's response - so rather than making me more productive it made me worse.
A rule based system might have been better or perhaps loading the data into a DB or data frame and getting it to produce the SQL and run that instead.
Either way this simple task could not be solved by the best LLM model out there.
I think most people are buying into the hype because search for information has just become baaaaad due to uuuhm ... incentives and the current crop of LLM provides a good view into all information available (also behind paywalls...) on the internet in any language.
That way it works great if you want to solve programming problem X for which there is a library and it also works if you want to know which companies built bicycles in Poland pre-1990. It also was very confident in answering what happened to the factories. Just that some of the factories didn't exist and some of the new companies it mentioned came very close to what google search would return... (aka unrelated advertisement)
I had to do me expenses this morning and one task is to split out the bill into Room, Taxes and Dinning. I asked gpt-4o to handle and it was a few dollars short. I then asked it to check and be careful and it produced A/B alternative response where A was actually correct when it figured out it's calculation was short of the provided total in the bill.
It could not be trusted to get it right first time and hence I had to do the calcs as well as check it's response - so rather than making me more productive it made me worse.
A rule based system might have been better or perhaps loading the data into a DB or data frame and getting it to produce the SQL and run that instead.
Either way this simple task could not be solved by the best LLM model out there.