Can you please share more info on this? I have a 6900xt "gathering dust" in a pr...

throwing_away · on Aug 16, 2023

Sure thing. There's a bunch of ways to do it, but here's some quick notes on what I did.

* arch linux has tons of `rocm` packages. I installed pretty much all of them: https://archlinux.org/packages/?sort=&q=rocm&maintainer=&fla...

* you also need this one package from AUR: https://aur.archlinux.org/packages/opencl-amd

* llama.cpp now has GPU support including "CLBlast", which is what we need for this, so compile with LLAMA_CLBLAST=ON

* now you can run any model llama.cpp supports, so grab some ggml models that fit on the card from https://huggingface.co/TheBloke.

* Test it out with: ./main -t 30 -ngl 128 -m huginnv1.2.ggmlv3.q6_K.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"

* You should see `BLAS = 1` in the llama.cpp output and you should get maybe 5 tokens per second on a 13b 6bit quantized ggml model.

* You can compile llama-cpp-python with the same arguments and get text-generation-ui to work also, but there's a bit of dependency fighting to do it.

* koboldcpp might be better, I just haven't tried it yet

Hope that helps!

Edit: just tried https://github.com/LostRuins/koboldcpp and it also works great. I should have started here probably.

Compile with `make LLAMA_CLBLAST=1` run with ` python koboldcpp.py --useclblast 0 0 --gpulayers 128 huginnv1.2.ggmlv3.q6_K.bin`

brucethemoose2 · on Aug 16, 2023

Text gen ui is nice. Some specific nicities include pre formatted instruct templates for popular modules, good prompt caching from llama-cpp-python, and integration of a vector db.

But its also finicky, kinda unstable and the dependencies are tricky.

Koboldcpp has other nicities, like some different generation parameters to tweak and some upstream features pulled in from PRs before the official llama.cpp release has them. The UI is nice, predating llamav1. Its standalone, dead simple to compile and has integration with AI Horde, which is (IMO) a huge essential feature.

baby_souffle · on Aug 16, 2023

Same boat.

I’d love to host something local but have been so overwhelmed by the rapid progress and every time I start looking I find a guide that inevitably has a “then plug in your OpenAI api key…” step which is a hard NOPE for me.

I have a few decent gpus but I’ve got no idea where to start…

brucethemoose2 · on Aug 16, 2023

Path of least resistance:

- Download koboldcpp: https://github.com/LostRuins/koboldcpp

- Download your 70B ggml model of choice, for instance airoboros 70B Q3_K_L: https://huggingface.co/models?sort=modified&search=70b+ggml

- Run Koboldcpp with opencl (or rocm) with as many layers as you can manage on the GPU. If you use rocm, you need to install the rocm package from your linux distro (or direct from AMD on Windows).

- Access the UI over http. Switch to instruct mode and copy in the correct prompt formatting from the model download page.

- If you are feeling extra nice, get an AI Horde API key and contribute your idle time to the network, and try out other models on from other hosts: https://lite.koboldai.net/#