mpavlov's comments

mpavlov · 2025-10-28T14:43:26 1761662606

(author of PokerBattle is here)

Well, you're not wrong :) Vercel is not the one to blame here, it's my skill issue. Entire thing was vibecoded by me — product manager with no production dev experience. Not to promote vibecoding, but I couldn't do it myself the other way.

sammy2255 · 2025-10-29T06:45:36 1761720336

sorry i was mean

mpavlov · 2025-10-28T12:06:32 1761653192

(author of PokerBattle here)

You right, results and numbers are mainly for entertainment purposes. This sample size would allow to analyze main reasoning failure modes and how often they occur.

mpavlov · 2025-10-28T12:04:26 1761653066

(author of PokerBattle here)

Haven't seen it before, thanks Are you affiliated with them?

mpavlov · 2025-10-28T11:56:40 1761652600

(author of PokerBattle here)

I think it would've completely crush them (like any other solver-based solution). Poker is safe for now :)

mpavlov · 2025-10-28T11:54:13 1761652453

(author of PokerBattle here)

I noticed the same and think that you're absolutely right. I've thought about adding their current hand / draw, but it was too close to the event to test it properly.

mpavlov · 2025-10-28T11:49:24 1761652164

(author of PokerBattle here)

That’s true. The original goal was to see which model performs statistically better than the others, but I quickly realized that would be neither practical nor particularly entertaining.

A proper benchmark would require things like: - Tens of thousands of hands played - Strict heads-up format (only two models compared at a time) - Each hand played twice with positions swapped

The current setup is mainly useful for observing common reasoning failure modes and how often they occur.

mpavlov · 2025-10-28T11:38:57 1761651537

(author of PokerBattle here)

That's cool! Do you have a recording of the talk? You can use PokerKit (https://pokerkit.readthedocs.io/en/stable/) for the engine.

pablorodriper · 2025-10-28T12:27:21 1761654441

Thank you! I’ll take a look at that. Honestly, building the game was part of the fun, so I didn’t look into open-source options.

The slides are in the repo and the recording will be published on the Python España YouTube channel in a couple of months (in Spanish): https://www.youtube.com/@PythonES

mpavlov · 2025-10-28T11:35:20 1761651320

(author of the PokerBattle here)

Depends on what your goal is, I think.

And it's also a thing — https://huskybench.com/

lvl155 · 2025-10-28T14:38:41 1761662321

Great job on this btw. I don’t mean to take away anything from your work. I’ve also toyed with AI H2H quite a bit for my personal needs. It’s actually a challenging task because you have to have a good understanding of the models you’re plugging in.