Managed to get 1.8k tokens per second with a batch of 60 when running vLLM with ...

behnamoh · on Sept 29, 2023

I prefer GPT-4's low speed to any other model's fast speed because with these models, quality is the most important thing.

purplecats · on Sept 29, 2023

i agree with your sentiment but keep in mind speed (slowness) could be a red herring. i find it plausible that while they degrade the quality of GPT4 in order to (presumably) lower their costs (while maintaining or increasing the price), they might add subtle slight delays to give the impression that the app is doing hard quality work.

kind of like that infamous android virus scanner app that just had a timer controlling the work in progress animation to give the impression of quality work being done.