Hacker Newsnew | past | comments | ask | show | jobs | submit | ddp26's commentslogin

Author here. Great point, and I think this is due to what another commenter points out, that the questions are different.

The right test of this is to take the _same_ markets that run for 90+ days, and check accuracy 90 days out vs 30 days out. I've done this on other prediction market datasets, though not on Kalshi and Polymarket, and found that forecasts are in fact more accurate 30 days out.

I agree that if they weren't, that would be incredibly suspicious!


Author here. Hal Varian pointed me to this 1992 paper, which I think is still considered the canonical empirical piece on what is actually going on in trading behavior that leads to accuracy (or not): https://www.jstor.org/stable/2117471

Yeah. People have put together a Prediction Market Database [1] (in a Google sheet), I think it's pretty well sourced and shows a good number of both real money and play money prediction markets from before 2002.

DARPA did have a big role though, too.

[1] https://docs.google.com/spreadsheets/d/1vGjnJPxdnBKwag3Q9Uy_...


It's true they are "just" summarizing current knowledge. But there are better and worse summaries of current knowledge!

Some summaries, like on some prediction markets, have objective accuracy that is much better than chance.


Author here. Agree, and I wrote in that section "Absolute accuracy is hard to compare across markets on one platform, and across platforms, because different forecasting questions have different difficulties. I addressed this by tracking similar markets on a single platform over time."

Even doing this, it's not apples-to-apples. One thing is, in this article, I filter only to "interesting" markets, so that controls for the % that are "easy" as you describe.


Thanks for the reply. Yeah, I think all of your filtering and categorizing makes these analyses really nice.

Yeah, the question in the title can be answered: "by using gpt-4o, a model 2 years behind the frontier, to serve audio responses"

Training window cutoff is Jan 2026, when Opus 4.6 was Aug 2025. That quite a lot of new world knowledge.


The free open source model does have its competitive advantages!


The second paragraph starts "Muse Spark is the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts. To support further scaling, we are making strategic investments..."

This article is about Meta, not about the user. Who signs off on these? Is the intended audience other people at Meta, not the user?


The article is published primarily to signal to the market that Meta is serious in its efforts to compete in building frontier ai models.

They want to 1) attract talent, 2) tell wall street they can play in this space as well, 3) help employees feel the company is moving in the right direction.

A frontier LLM doesn't apply to their core consumer products.


the blog is the product. investor deck posted as a tech launch


Stock up 9% today, very pleasant for Zuck if you do the math on his net worth :)


I mean, kinda? It's not like Zuck is selling his stock tomorrow, so daily fluctuations in stock price don't really affect him.


He can borrow against that, so it actually does matter.


Got a source on this? I didn't take into account in this forecast that public markets could be very inefficient in this way.


oh baby, that's the just 'new' way they screw ya


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: