Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why quantize something that is already very small (270mb)?


Just making up stuff here, but smaller models are great for serverless compute like functions, which would also benefit from lighter computation. Don't forget, some people are dealing with hundreds of millions of documents. Accelerating this by 4x may be worth a small performance hit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: