More

ddp26 · 2026-05-09T00:51:42 1778287902

Author here. Great point, and I think this is due to what another commenter points out, that the questions are different.

The right test of this is to take the _same_ markets that run for 90+ days, and check accuracy 90 days out vs 30 days out. I've done this on other prediction market datasets, though not on Kalshi and Polymarket, and found that forecasts are in fact more accurate 30 days out.

I agree that if they weren't, that would be incredibly suspicious!

ddp26 · 2026-05-09T00:46:36 1778287596

Author here. Hal Varian pointed me to this 1992 paper, which I think is still considered the canonical empirical piece on what is actually going on in trading behavior that leads to accuracy (or not): https://www.jstor.org/stable/2117471

ddp26 · 2026-05-09T00:44:51 1778287491

Yeah. People have put together a Prediction Market Database [1] (in a Google sheet), I think it's pretty well sourced and shows a good number of both real money and play money prediction markets from before 2002.

DARPA did have a big role though, too.

[1] https://docs.google.com/spreadsheets/d/1vGjnJPxdnBKwag3Q9Uy_...

ddp26 · 2026-05-09T00:42:17 1778287337

It's true they are "just" summarizing current knowledge. But there are better and worse summaries of current knowledge!

Some summaries, like on some prediction markets, have objective accuracy that is much better than chance.

ddp26 · 2026-05-09T00:39:44 1778287184

Author here. Agree, and I wrote in that section "Absolute accuracy is hard to compare across markets on one platform, and across platforms, because different forecasting questions have different difficulties. I addressed this by tracking similar markets on a single platform over time."

Even doing this, it's not apples-to-apples. One thing is, in this article, I filter only to "interesting" markets, so that controls for the % that are "easy" as you describe.

bo1024 · 2026-05-09T04:11:53 1778299913

Thanks for the reply. Yeah, I think all of your filtering and categorizing makes these analyses really nice.

ddp26 · 2026-05-04T23:08:33 1777936113

Yeah, the question in the title can be answered: "by using gpt-4o, a model 2 years behind the frontier, to serve audio responses"

ddp26 · 2026-04-17T02:20:08 1776392408

Training window cutoff is Jan 2026, when Opus 4.6 was Aug 2025. That quite a lot of new world knowledge.

ddp26 · 2026-04-09T14:33:19 1775745199

The free open source model does have its competitive advantages!

ddp26 · 2026-04-08T16:26:57 1775665617

The second paragraph starts "Muse Spark is the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts. To support further scaling, we are making strategic investments..."

This article is about Meta, not about the user. Who signs off on these? Is the intended audience other people at Meta, not the user?

tjkrusinski · 2026-04-08T16:30:09 1775665809

The article is published primarily to signal to the market that Meta is serious in its efforts to compete in building frontier ai models.

They want to 1) attract talent, 2) tell wall street they can play in this space as well, 3) help employees feel the company is moving in the right direction.

A frontier LLM doesn't apply to their core consumer products.

Lihh27 · 2026-04-08T16:46:58 1775666818

the blog is the product. investor deck posted as a tech launch

conradkay · 2026-04-08T16:42:35 1775666555

Stock up 9% today, very pleasant for Zuck if you do the math on his net worth :)

hungryhobbit · 2026-04-08T17:49:18 1775670558

I mean, kinda? It's not like Zuck is selling his stock tomorrow, so daily fluctuations in stock price don't really affect him.

throwaway173738 · 2026-04-09T12:45:00 1775738700

He can borrow against that, so it actually does matter.

ddp26 · 2026-04-02T18:56:26 1775156186

Got a source on this? I didn't take into account in this forecast that public markets could be very inefficient in this way.

conorcleary · 2026-04-02T19:00:05 1775156405

oh baby, that's the just 'new' way they screw ya