Its really time for some competition. Either AMD or some chinese company like 'more threads' need to speed up and get something on the market to break the Nvidia dominance. Nvidia is showing already some nasty typical evil behavior that has to be stopped. I know not easy with fully booked partners at Samsung/tmct/etc
AMD / Intel should be throwing them 10's or 100's millions (or jsut straight up human hours of work) to make that work. If / when it does it would be 10's or 100's of billions of (additional) market cap for them.
So why doesn't AMD invest more in quality software? Everybody knows it's holding them back. Right now a tiny corporation has to jump through hoops to do work they could have done at any point in the last decade. Why don't they pull a great team together and just do the work, if the pay off is that big?
Perhaps they think using GPUs for computation is a passing fad? They hate money? Their product is actually terrible and they dont want to get found out (that one might be true for intel)?
In general it's pretty rare for hardware first companies to put out good software. To me it looks like there are structural reasons for this, hardware requires waterfall development which then gets imposed on software, for instance.
They are. FSR tests side by side with DLSS most people can’t tell the difference or pick FSR. Then you tell them which is which and they turn around and say DLSS is better. People are just bias to Nvidia.
They haven’t had gfx card driver issues in years now and people still say “oh I don’t want AMD cos their drivers don’t work”.
I’m not sure what you’re talking about because side by side testing people can’t tell the difference with the exception of racing games (tho that’s screwed on DLSS 3 too anyway) and if you take screen grabs to look at. So fact is. All the compute in Nvidia cards is a gimmick. If you disagree then you’re wrong. The competitive edge of DLSS is gone.
1 bad driver update is not indicative of anything. Nvidia has had bad driver updates but you’re not shutting all over them. And running Nvidias own drivers on linux is still a pain point.
(And don’t try claim I’m an AMD fanboy when I don’t even have any AMD stuff at the moment. It’s all Intel/Nvidia)
FSR is pretty bad, like it's not even close to DLSS, no one like fsr. And saying that there is not difference is wrong, just play a game with fsr 2.1 and dlss 2 or 3 please.
I have a 4070. FSR/DLSS on quality looks the same. It’s only noticeable in Forza Horizon. If you notice it in a non racing game then you’re looking for the differences.
Yes, but you need ROCm which mostly only runs on AMD's professional cards and requires using the proprietary driver rather than the wonderfully stable open source one.
ROCm only officially supports a handful of server or workstation cards, but it works on quite a few others.
I've enabled nearly all GFX9 and GFX10 GPUs as I have packaged the libraries for Debian. I haven't tested every library with every GPU, but my experience has been that they pretty much all work. I suspect that will also be true of GFX11 once we move rocm-hipamd to LLVM 16.
Intel is very much putting its money where its mouth is with SyCL/OneApi. They are spending a lot of money and advancing a lot faster than AMD, and in many ways it's a better approach (focused on CUDA-style DSL but portable across hardware) rather than just another ecosystem.
(to their credit AMD is also getting serious lately, they put out a listing for like 30 ROCm developers a few weeks after geohot's meltdown, and they were in the process of doing a Windows release (previously linux-only) of ROCm with support for consumer gaming GPUs at the time as well. The message seems to have finally been received, it's a perennial topic here and elsewhere and with the obvious shower of money happening, maybe management was finally receptive to the idea that they needed to step it up.)
[1] links to https://github.com/RadeonOpenCompute/ROCm/issues/2198 which has all the context (driver bugs, vowing to stop using AMD, Lisa Su's response that they're committed to fixing this stuff, a comment that it's fixed)
Here's a list of possible "monopoly breakers" I'm going to write about in another post - some of these are things people are using today, some are available but don't have much user adoption, some are technically available but very hard to purchase or rent/use, and some aren't yet available:
* Software: OpenAI's Triton (you might've noticed it mentioned in some of "TheBloke" model releases and as an option in the oobabooga text-generation-webui), Modular's Mojo (on top of MLIR), OctoML (from the creators of TVM), geohot's tiny corp, CUDA porting efforts, PyTorch as a way of reducing reliance on CUDA
* Hardware: TPUs, Amazon Inferentia, Cloud companies working on chips (Microsoft Project Athena, AWS Tranium, TPU v5), chip startups (Cerebras, Tenstorrent), AMD's MI300A and MI300X, Tesla Dojo and D1, Meta's MTIA, Habana Gaudi, LLM ASICs, [+ Moore Threads]
The A/H100 with infiniband are still the most common request for startups doing LLM training though.
The current angle I'm thinking about for the post would be to actually use them all. Take Llama 2, and see which software and hardware approaches we can get inference working on (would leave training to a follow-up post), write about how much of a hassle it is (to get access/to purchase/to rent, and to get running), and what the inference speed is like. That might be too ambitious though, I could see it taking a while. If any freelancers want to help me research and write this, email is in my profile. No points for companies that talk a big game but don't have a product that can actually be purchased/used, I think - they'd be relegated to a "things to watch for in future" section.