Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
koljab
8 months ago
|
parent
|
context
|
favorite
| on:
Show HN: Real-time AI Voice Chat at ~500ms Latency
With the current 24b LLM model it's 24 GB. I have no clue how far down you can go with the GPU is using smaller models, you can set the model in server.py. Quite sure 16 GB will work but at some point it will probably fail.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: