Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The model we are comparing against makes 10X as many errors.

I hadn't imagined someone would argue that's not a meaningful difference.

Though the difference is statistically significant too.



Not sure what kind of argument that is. If something overfits it will have less error, does that make it better? It may mean it would generalize a lot less when run on more data. Whether or not something is meaningful depends on what you take the meaning to be.


Not the OP, but it wanted to point out it has 10X less error on the holdout sample so it is not simply overfitting.


It doesn't matter that it's on the holdout, he's partitioning an already small dataset into 5 partitions and talking about the accuracy in using 80 points to predict 20 points. The whole argument is usually that in the law of large numbers you can now have a statistically significant difference in accuracy. When you're predicting 20 points each with 5 (potentially different) models you likely don't have enough to talk about statistical significance.


We tried to mirror the original analysis as closely as possible - we did 5-fold cross validation but used the standard MNIST test set for evaluation (about 2,000 validation samples for 0s and 1s). We split the test set into 2 pieces. The first half was used to assess convergence of the training procedure while the second half was used to measure out of sample predictive accuracy.

Predictive accuracy is measured on 1000 samples, not 20.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: