Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Mini-vLLM in ~500 lines of Python (github.com/ubermenchh)
4 points by ubermenchh 1 day ago | hide | past | favorite | 2 comments
I built this to understand how vLLM works internally.




I'm not familiar with the thing you're recreating (I gather it's something to do with getting better responses out of LLMs by manipulating the context or something like that?) but I appreciate that you haven't, like so many others, dropped ten paragraphs of Markdown-formatted press release (without bothering to check whether the formatting even works here) on us echoing a bunch of marketing-speak in a README.

Haha, i just wanted my repo to be out here. If someone finds it interesting they can always just check the repo. And you're close, its about getting faster responses from the model by manipulating the request queues and memory.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: