Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For a dryer, more formal and succinct approach, see "The Transformer Model in Equations" [0], by John Thickstun. The whole thing fits in a single page, using standard mathematical notation.

[0] https://johnthickstun.com/docs/transformers.pdf



Finally, thank you so much! Was it so difficult? Isn't 7 lines of mathematical notation way better than pages of qualitative pub talking? I don't really understand these ML researchers, it always looks like they have never studied mathematics at all.


Thank god, I've had to cobble something like this together for my own notes a couple of times trying to parse papers and was never quite sure if I was missing something.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: