Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> if you had a volcano that erupts roughly ever 100 years, base rate reasoning using the past 99 years of data would suggest that the probability is 0

Sparsity is a problem whenever you use data to predict or model something. Your example here is essentially subsampling only zeros from a sparse time series. The existence of sparsity isn't a new insight that invalidates everything. It's a challenge to be overcome by careful practice.

> when in reality your base rate in the year following an eruption is 0 with every passing year your probability of an eruption would increase & increase past 10% for every year past 100 that goes without explosion.

Sure, with large amounts of prior knowledge, you can do better than a naive base rate starting point. I'm sure that practitioners know this. Even in this contrived example, the base rate would have been a good first guess.

> predicting a rare event doesn't happen isn't that difficult

Doing it accurately (in the sense of having a low Brier score) is apparently difficult.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: