With the current 24b LLM model it's 24 GB. I have no clue how far down you can g...

		koljab 8 months ago \| parent \| context \| favorite \| on: Show HN: Real-time AI Voice Chat at ~500ms Latency With the current 24b LLM model it's 24 GB. I have no clue how far down you can go with the GPU is using smaller models, you can set the model in server.py. Quite sure 16 GB will work but at some point it will probably fail.