The model we are comparing against makes 10X as many errors. I hadn't imagined s...

blueblob · on June 5, 2017

Not sure what kind of argument that is. If something overfits it will have less error, does that make it better? It may mean it would generalize a lot less when run on more data. Whether or not something is meaningful depends on what you take the meaning to be.

toth · on June 5, 2017

Not the OP, but it wanted to point out it has 10X less error on the holdout sample so it is not simply overfitting.

blueblob · on June 5, 2017

It doesn't matter that it's on the holdout, he's partitioning an already small dataset into 5 partitions and talking about the accuracy in using 80 points to predict 20 points. The whole argument is usually that in the law of large numbers you can now have a statistically significant difference in accuracy. When you're predicting 20 points each with 5 (potentially different) models you likely don't have enough to talk about statistical significance.

dbecker · on June 5, 2017

We tried to mirror the original analysis as closely as possible - we did 5-fold cross validation but used the standard MNIST test set for evaluation (about 2,000 validation samples for 0s and 1s). We split the test set into 2 pieces. The first half was used to assess convergence of the training procedure while the second half was used to measure out of sample predictive accuracy.

Predictive accuracy is measured on 1000 samples, not 20.