This is my area of expertise. I love the experiment.
In general games of imperfect information such as Poker, Diplomacy, etc are much much harder than perfect information games such as Chess.
Multiplayer (3+) poker in particular is interesting because you cannot achieve a nash equilibrium (e.g. it is not zero sum).
That is part of the reason they are a fantastic venue for exploration of the capabilities of LLMs. They also mirror the decision making process of real life. Bezos framed it as "making decisions with about 70% of the information you wish you had."
As it currently stands having built many poker AIs, including what I believe to be the current best in the world, I don't think LLMs are remotely close to being able to do what specialized algorithms can do in this domain.
All of the best poker AI's right now are fundamentally based on counter factual regret minimization. Typically with a layer of real time search on top.
Noam Brown (currently director of research at OpenAI) took the existing CFR strategies which were fundamentally just trying to scale at train time and added on a version of search, allowing it to compute better policies at TEST TIME (e.g. when making decisions). This ultimately beat the pros (Pluribus beat the pros at 6 max in 2018 I believe). It stands as the state of the art, although I believe that some of the deep approaches may eventually topple it.
Not long after Noam joined OpenAI they released the o1-preview "thinking" models, and I can't help but think that he took some of his ideas for test time compute and applied them on top of the base LLM.
It's amazing how much poker AI research is actually influencing the SOTA AI we see today.
I would be surprised if any general purpose model can achieve true human level or super human level results, as the purpose built SOTA poker algorithms at this point play substantially perfect poker.
Background:
- I built my first poker AI when I was in college, made half a million bucks on party poker. It was a pseudo expert system.
- Created PokerTableRatings.com and caught cheaters at scale using machine learning on a database of all poker hands in real time
- Sold my poker AI company to Zynga in 2011 and was Zynga Poker CTO for 2 years pre/post IPO
- Most recently built a tournament version of Pluribus (https://www.science.org/doi/10.1126/science.aay2400). Launching as duolingo for poker at pokerskill.com
In general games of imperfect information such as Poker, Diplomacy, etc are much much harder than perfect information games such as Chess.
Multiplayer (3+) poker in particular is interesting because you cannot achieve a nash equilibrium (e.g. it is not zero sum).
That is part of the reason they are a fantastic venue for exploration of the capabilities of LLMs. They also mirror the decision making process of real life. Bezos framed it as "making decisions with about 70% of the information you wish you had."
As it currently stands having built many poker AIs, including what I believe to be the current best in the world, I don't think LLMs are remotely close to being able to do what specialized algorithms can do in this domain.
All of the best poker AI's right now are fundamentally based on counter factual regret minimization. Typically with a layer of real time search on top.
Noam Brown (currently director of research at OpenAI) took the existing CFR strategies which were fundamentally just trying to scale at train time and added on a version of search, allowing it to compute better policies at TEST TIME (e.g. when making decisions). This ultimately beat the pros (Pluribus beat the pros at 6 max in 2018 I believe). It stands as the state of the art, although I believe that some of the deep approaches may eventually topple it.
Not long after Noam joined OpenAI they released the o1-preview "thinking" models, and I can't help but think that he took some of his ideas for test time compute and applied them on top of the base LLM.
It's amazing how much poker AI research is actually influencing the SOTA AI we see today.
I would be surprised if any general purpose model can achieve true human level or super human level results, as the purpose built SOTA poker algorithms at this point play substantially perfect poker.
Background:
- I built my first poker AI when I was in college, made half a million bucks on party poker. It was a pseudo expert system. - Created PokerTableRatings.com and caught cheaters at scale using machine learning on a database of all poker hands in real time - Sold my poker AI company to Zynga in 2011 and was Zynga Poker CTO for 2 years pre/post IPO - Most recently built a tournament version of Pluribus (https://www.science.org/doi/10.1126/science.aay2400). Launching as duolingo for poker at pokerskill.com