i agree with your sentiment but keep in mind speed (slowness) could be a red herring. i find it plausible that while they degrade the quality of GPT4 in order to (presumably) lower their costs (while maintaining or increasing the price), they might add subtle slight delays to give the impression that the app is doing hard quality work.
kind of like that infamous android virus scanner app that just had a timer controlling the work in progress animation to give the impression of quality work being done.
vllm==0.2.0 got released an hour or so ago, so it's pretty fresh. Let me know fi you'd like anything else in there.