Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Best way to mess around with FlexGen and LLMs on local hardware in general is https://github.com/oobabooga/text-generation-webui


Can't get it to run on amd 6700xt even though there's ROCm installation instructions. Tried to run llama 7b but got hung up because bitsandbytes calls CUDA.


Par for the course, I'm afraid.

Over the years I've made several attempts at AMD/ROCm. I've wasted so much time AMD owes me free GPUs for life ;).

At this point I've just accepted that AMD is irrelevant in ML and they're fine with that - they're not making much of an effort to change that. I really and truly wish that wasn't the case but I've accepted that it is.


> At this point I've just accepted that AMD is irrelevant in ML and they're fine with that - they're not making much of an effort to change that. I really and truly wish that wasn't the case but I've accepted that it is.

Same here. AMD's neglect of non-gaming uses is likely how Intel is going to become number 2 in GPUs.


Is there an equivalent to Nvidia-Docker for ROCm to make it brain dead easy? Where does the complexity come in for AMD GPUs?


It's far more complicated than that. Take a look at the Nvidia Frameworks Support Matrix[0]. The Nvidia PyTorch Docker container is 20GB uncompressed(!!!!) and consists of dozens of layers of Nvidia/CUDA tailored software stacks[1] that all come together to do this magical ML/AI stuff we take for granted on CUDA hardware. BTW, we have a nice, clean Nvidia CUDA docker situation because Nvidia has been working on it for years. It's rock solid and universally supported.

Moving to a lower level, check out the release notes for the latest Nvidia driver and look in amazement at the sheer number of supported GPUs[2]. In short, literally every GPU they've put in a laptop, desktop, workstation, or datacenter over the past decade (plus their embedded Jetson stuff). There are over 2,000 GPUs listed there and I can tell you from experience the "Unified" in CUDA holds up. If the card is supported by the driver, it has a compatible compute arch, and enough VRAM whatever you throw at it will just work. In many cases even on Windows! People complain about the proprietary driver but the fact is if your GPU says Nvidia on it you can install the one driver on any distro and have these projects up and running in minutes.

Compare that to the ROCm "list"[3] of what, maybe a dozen GPUs? I know from experience (unfortunately) that even with "supported" hardware in many cases just getting the driver to work is a nightmare. Then you have basic frameworks randomly crashing, etc. It's a complete mess and like I said - AMD is a decade behind today and Nvidia's lead is only growing.

[0] - https://docs.nvidia.com/deeplearning/frameworks/support-matr...

[1] - https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorc...

[2] - https://download.nvidia.com/XFree86/Linux-x86_64/530.41.03/R...

[3] - https://docs.amd.com/bundle/Hardware_and_Software_Reference_...


When you follow ROCm's happy path it works well for getting things like PyTorch running, but you do end up putting up with enterprise distros like Ubuntu 22.04 LTS and its often broken by default network stack.

Enabling ROCm support for unlisted AMD GPUs & APUs is a feature flag away if your running one of these janky enterprise distros that AMD has blessed.

Debian is slowly getting ROCm and PyTorch's packages into the main Debian archive, once complete using AMD hardware for machine learning should be a breezey Apt Install ROCm away, as Debian's machine learning team chooses sane defaults like enabling APU support by default. Derivative distros like PopOS, Linux Mint, Zorin, Kali, Ubuntu and such will inherit this easy support from upstreaming ROCm as well.


Genuinely curious - when you say “PyTorch runs” what are you doing with it?

When I last tried it a year or so ago it was (more or less) useless. Yes it “ran” but there were weird crashes and edge cases all over the place. Throw in the fact that 95% of documentation, examples, tools, benchmarks, etc are still for CUDA after four years or so of embarking on this journey I think I’ve finally given up for good.

I happily and very firmly live in Nvidia/CUDA land now where I can see a story on HN and have it running on my GPU in under 10 minutes. Or 30 seconds if it’s a docker container.

You’re brave for running on unsupported GPUs! My experience was bad enough with the “supported” ones :).


It seems stable with Whisper and a few other models...


how do I turn on this feature flag? I'm quite happy on xubuntu.


Read their GitHub, there is an issue where they cover the feature flag you need to set to enable ROCm on unsupported platforms.


Adding to this: I used AMD's official amdgpu tool to install ROCm and now my card is failing to be recognized on boot by ubuntu ~80% of the time, causing a failure to load lightdm & start up. Tomorrow I'll dig around in journalctl to see if I can fix it.

I'm thiiiiis close to throwing up my hands and getting a 4070 but I've been advised that other than ML/CUDA, AMD>Nvidia for linux. Also the 4070 is 2x the price for just slightly better performance.


I’ve been there. Feel free to keep at it but unless your goal is to be a ROCm developer and contribute to the driver, upstream projects, etc IMO it’s just not worth it.

There is so much interesting, educational, and productive stuff going on with GPUs these days… At some point people need to ask themselves “did I buy a GPU to learn/use ML, or did I buy a GPU to beta test AMDs software stack?”

When it comes to GPU compute Nvidia cards are substantially cheaper if you value your time and sanity.

Note I don’t have any vested interest in Nvidia whatsoever, I’m just mad at myself for spending my time and money thinking AMD actually cared about any of this - they clearly don’t and the lack of progress I’ve seen and experienced over the past five years I’ve tried with them is insulting to me and the rest of their so called user base.


Any word on who has better vulkan support?


I don't know or have any experience because I've never needed/been interested in it.


Thank you so so much, this was the only way to get LLaMA running on my desktop's GPU! Everything else was plagued by everything from compile errors to version mismatches to miscompiled wheels to weird contradictions or whatever. I'm so happy that this works. I can finally use an LLM to my heart's content without relying on OpenAI and their stupid server load and phone number requirement


What's the "best"/most like GPT-4 model to use with this?



Excellent, thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: