Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems short sighted to me. LLMs could have any data in their training set encoded as tokens. Either new specialized tokens are explicitly included (e.g: Vision models) or the language encoded version of everything that usually exists (e.g: the research paper and the csv with the data).

To improve next token prediction performance on these datasets and generalize requires a much richer latent space. I think it could theoretically lead to better results from cross-domain connections (ex: being fluent in a specific area of advanced mathematics, quantum mechanics, and materials engineering is key to a particular breakthrough)



Then you are now implementing a parser.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: