Author here. Great point, and I think this is due to what another commenter points out, that the questions are different.
The right test of this is to take the _same_ markets that run for 90+ days, and check accuracy 90 days out vs 30 days out. I've done this on other prediction market datasets, though not on Kalshi and Polymarket, and found that forecasts are in fact more accurate 30 days out.
I agree that if they weren't, that would be incredibly suspicious!
Author here. Hal Varian pointed me to this 1992 paper, which I think is still considered the canonical empirical piece on what is actually going on in trading behavior that leads to accuracy (or not): https://www.jstor.org/stable/2117471
Yeah. People have put together a Prediction Market Database [1] (in a Google sheet), I think it's pretty well sourced and shows a good number of both real money and play money prediction markets from before 2002.
Author here. Agree, and I wrote in that section "Absolute accuracy is hard to compare across markets on one platform, and across platforms, because different forecasting questions have different difficulties. I addressed this by tracking similar markets on a single platform over time."
Even doing this, it's not apples-to-apples. One thing is, in this article, I filter only to "interesting" markets, so that controls for the % that are "easy" as you describe.
The second paragraph starts "Muse Spark is the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts. To support further scaling, we are making strategic investments..."
This article is about Meta, not about the user. Who signs off on these? Is the intended audience other people at Meta, not the user?
The article is published primarily to signal to the market that Meta is serious in its efforts to compete in building frontier ai models.
They want to 1) attract talent, 2) tell wall street they can play in this space as well, 3) help employees feel the company is moving in the right direction.
A frontier LLM doesn't apply to their core consumer products.
The right test of this is to take the _same_ markets that run for 90+ days, and check accuracy 90 days out vs 30 days out. I've done this on other prediction market datasets, though not on Kalshi and Polymarket, and found that forecasts are in fact more accurate 30 days out.
I agree that if they weren't, that would be incredibly suspicious!
reply