Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, so just the weight sharing architecture does so much already? I am wondering if the same could be done with LSTMs on sequences or CNNs on voice...


I'm wondering the same thing too.

Note also that this finding strongly suggests that neural net architecture actually is quite important, possibly even more important than having more data -- which contradicts the conventional wisdom!


There is some pretty strong evidence for this: all the toddlers in the world. You only need to show them something once and they'll immediately be able to recognize more examples of the same thing from different angles and even when it is partially hidden. All they have to guide them is the structure of their brains, not the quantity of data they have been exposed.


"All they have to guide them is the structure of their brains, not the quantity of data they have been exposed."

A typical toddler (say 12 months' old) has spent 4000-5000 hours with open eyes. Even if you assume a low frame rate (10fps), resolution (1080p), and a 1000:1 compression ratio, that's still 1TB of training data.



Also, find me a 12 month old that can recognize an object after seeing it once.


Seems like this view gets told every once in a while by someone who clearly hasn't been around any 0-2 year olds.


This is HN, what did you expect? Real people with wife and kids?


Women are real people too.


Are you suggesting that women cannot have a wife and kids?


Try not to be an asshole. Thank you.


Your comment is hilariously wrong. Please do not make assumptions like this, you're typically going to be embarrassed.


Please show me a child that recognizes a new object after one look.


Shall I mail them to you?


Certainly not true.. reading takes ages for instance. Associating objects to words takes forever.. Perhaps this is true in another sense but in the sense i described.


Reading is a lot more complex than object recognition.


It's not about neural network architecture. CNNs are taught by presenting them overlapping pieces of image. To speed things up and keep things orgsnized this is not done sequentially but in parralel making multiple neurons share weights but this is just a trick.

So what makes this result possible is not the architecture of NN in CNN but rather architecture of C. That allows us to get multiple samples from single image. The rest is just that actual content of the image is easier to learn then the noise.

Brain is almost nothing like CNN.


I think it's both C and NN. Don't forget each new C layer groups information from previous layers; using just a single C layer won't do you much good. It might not reflect brain much but it kinda resembles what retina/visual cortex neurons do; CNNs were actually inspired by visual field maps found in visual cortex and somebody had the idea that C is the most similar CV operation we have, and put them together. To everyone's surprise it worked nicely.


Next layers have exactly same trick as the first one. I don't quite buy that it resembles visual cortex.


It's probably just very rough "resemblance" :D It is said CNNs were "inspired" by visual field maps; I am 100% sure we know very little about how that part of brain works and maybe somebody just took a look at main/thickest connections between neurons there and tried to assemble them in a NN to see if it helps.


I'm thinking exactly same thing.


Do not Echo State Networks already do that?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: