Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The number reuse is just the author being a bit lazy. You could estimate how similar these vectors are by seeing if they point in similar directions or by calculating the angle between them. Here they are about 60° apart and somewhat the same direction, but a lot of this is that the author didn’t want to put in any negative numbers in the example so vectors end up being a bit more similar than they would be really.

That the numbers are reused isn’t meaningful here: a 1 in the first position is quite unrelated to a 1 in the second (as no convolutions are done over this vector)



Thank you. I guess I need to back up. This is a vector, not just an identifier, and direction and angle seem important. I need to look up how the encoding is normally done, since this isn't obvious if you haven't worked in this domain before.


The encoding is typically learned, and if possible is part of the ANN so that it can be adjusted along with the other parameters.

A good place to start on that topic is the word2vec paper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: