Show HN: Mini-vLLM in ~500 lines of Python

zahlman · 2025-12-29T11:58:29 1767009509

I'm not familiar with the thing you're recreating (I gather it's something to do with getting better responses out of LLMs by manipulating the context or something like that?) but I appreciate that you haven't, like so many others, dropped ten paragraphs of Markdown-formatted press release (without bothering to check whether the formatting even works here) on us echoing a bunch of marketing-speak in a README.

ubermenchh · 2025-12-29T18:10:07 1767031807

Haha, i just wanted my repo to be out here. If someone finds it interesting they can always just check the repo. And you're close, its about getting faster responses from the model by manipulating the request queues and memory.