Can you please share more info on this? I have a 6900xt "gathering dust" in a proxmox server - would like to try to do a passthrough to a vm and use it. Thank you in advance!
* llama.cpp now has GPU support including "CLBlast", which is what we need for this, so compile with LLAMA_CLBLAST=ON
* now you can run any model llama.cpp supports, so grab some ggml models that fit on the card from https://huggingface.co/TheBloke.
* Test it out with: ./main -t 30 -ngl 128 -m huginnv1.2.ggmlv3.q6_K.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"
* You should see `BLAS = 1` in the llama.cpp output and you should get maybe 5 tokens per second on a 13b 6bit quantized ggml model.
* You can compile llama-cpp-python with the same arguments and get text-generation-ui to work also, but there's a bit of dependency fighting to do it.
* koboldcpp might be better, I just haven't tried it yet
Text gen ui is nice. Some specific nicities include pre formatted instruct templates for popular modules, good prompt caching from llama-cpp-python, and integration of a vector db.
But its also finicky, kinda unstable and the dependencies are tricky.
Koboldcpp has other nicities, like some different generation parameters to tweak and some upstream features pulled in from PRs before the official llama.cpp release has them. The UI is nice, predating llamav1. Its standalone, dead simple to compile and has integration with AI Horde, which is (IMO) a huge essential feature.
I’d love to host something local but have been so overwhelmed by the rapid progress and every time I start looking I find a guide that inevitably has a “then plug in your OpenAI api key…” step which is a hard NOPE for me.
I have a few decent gpus but I’ve got no idea where to start…
- Run Koboldcpp with opencl (or rocm) with as many layers as you can manage on the GPU. If you use rocm, you need to install the rocm package from your linux distro (or direct from AMD on Windows).
- Access the UI over http. Switch to instruct mode and copy in the correct prompt formatting from the model download page.
- If you are feeling extra nice, get an AI Horde API key and contribute your idle time to the network, and try out other models on from other hosts: https://lite.koboldai.net/#