Not sure that would have mattered once LLMs emerged. NVidia's success is mostly a matter of being the only major advanced GPU player at the time when LLMs exploded onto the scene. That won't be the case for long. The big players are already heavily investing in high performance AI chips to wean themselves off a dependance on NVidia.
Nvidia’s success is due to a decade+ of investment into their software stack (CUDA and friends), built on a world view that more software can and should be massively parallel.
They were a key cause for LLMs being a thing in the first place.
This. I worked in HPC between 2010 and 2013, and people were trying to compete with NVidia in the GPGPU space even back then, seeing the way the wind was blowing. Compute bandwidth, and FLOPS/watt has steadily been more and more important the last 15 years, even before AI.
NVidia has continued to stay ahead because every alternative to CUDA is half baked trash, even when the silicone make sense. As a tech company, trading $$ for time not spent dealing with compatibility bugs and broken drivers pretty much always makes business sense.
Nvidia did a lot of things right. But it sure didn't hurt that the crypto bubble gave way to LLMs with almost perfect timing. Luck favors the prepared and all that but Nvidia also would have had trouble timing market changes better.
Yeah, exactly, it has never been clear that we would find a use for parallel computation outside of niches like graphics and HPC. And in fact most of the history of computing has been evidence of the opposite: useful computation has been predominantly serial in nature. But here we are now.
It’s pretty bad. just when you think you can order AMD chips since there is no shortage, and use a translation layer and have a cheap AI datacenter, it turns out AMD is fumbling the ball at every step of the way
It’s interesting. They have had plenty of time and resources available to mount solid competition. Why haven’t they? Is it a talent hiring problems or some more fundamental problem with their engineering processes? The writing has been on the wall for gpgpu for more than 10 years. Definitely enough time to catch up.
Its a commitment problem IMO. NVidia stuck with CUDA for a long time before it started paying them what it cost. AMD and Intel have both launched and killed initiatives a couple times each, but abandon them within a few years because adoption didn't happen overnight.
If you need people to abandon an ecosystem thats been developed steadily over nearly 20 years for your shiny new thing in order to keep it around, you'll never compete.
To be fair, Cuda has improved a lot since 2014 or so. I messed up my Linux box multiple times trying to install Cuda but the last time it was just apt install and maybe setting ld library and it all just worked.
I can't speak to modern hpc, since I've been out of the game for a few years but icc was absolutely the preferred compiler for almost any workload and Intel procs were the desired hardware to run on. AMD was joked about as the only reason Intel wasn't considered a monopoly at the time and that Intel would happily take wheelbarrows of cash over to AMD to prop them up, just to ensure they didn't even appear like a monopoly. But no one in the field was buying AMD procs for their workloads.
Most super computers were still Intel/IBM, but around that time is when the shift to GPU clusters started. #1 super computer spot was taken by an Nvidia cluster in 2012, but I remember other big projects were done before as well.
Are you sure that isn’t what @mullingitover meant?
> Number of SMs is a more appropriate equivalent to CPU core count.
What do you mean by this? Why should an SM be considered equivalent to a CPU core? An SM can do 128 simultaneous adds and/or multiplies in a single cycle, where a CPU core can do, what, 2 or maybe 4? Obviously depends on the CPU / core / hyperthreading / # math pipelines / etc., but the SM to CPU-core ratio of the number of simultaneous calculation is in the double digits. It’s a tradeoff where the GPU has some restrictions in return for being able to do many multiples more at the same time.
If you consider an SM and a CPU equivalent, then the SM’s perf can exceed the CPU core by ~2 orders of magnitude — is that the comparison you want? If you consider a GPU thread lane and a CPU thread lane equivalent, the the GPU thread lane is slower and more restricted. Neither comparison is apples to apples, CPUs and GPUs are made for different workloads, but arguing that an SM is equivalent to a CPU core seems equally or more “misleading” when you’re leaving out the tradeoff.
I’d argue that comparing SMs to cores is misleading and that it makes more sense to compare chips is by their thread counts. Or, don’t compare cores at all and just look at the performance in, say, FLOPS.
An SM is split into four identical blocks, and I would say each block is roughly equivalent to a CPU core. It has a scheduler, registers, 32 ALUs or FPUs, and some other stuff.
A CPU core with two AVX-512 units can do several integer operations plus 32 single-precision operations (including FMA) per cycle. Not 2 or 4. An older CPU with 2-3 AVX2 units could fall slightly behind, but it's pretty close.
That doesn't factor in the tensor units, but they're less general purpose, and CPUs usually put such things outside the cores.
I would say an SM is roughly equivalent to four CPU cores.
Yeah I totally forgot to consider CPU SIMD. Brain fart. The other comment corrected me too, you’re both right. When I said 2 or 4, I was thinking of the SISD math pipe and not AVX instructions.
Yes, considering CPU SIMD, maybe comparing a CPU core to a CUDA warp makes some sense in some situations. The peak FLOPS rate is still so much higher on Nvidia though, that the comparison hardly makes sense. So yeah like I and the other commenter mentioned, it depends entirely on what comparison is being made.
A single Zen5 core can do 32 single precision FMAs per clock.
That's using SIMD, but so is Nvidia for all intents and purposes. Those "cuda cores" aren't truly independent: when their execution diverges, masking is used pretty much like you'd do in CPU SIMD.
A lot of the control logic is per-SM or perhaps per-SIMD unit -- there are multiple of those per SM. You could perhaps make a case that it's the individual SIMDs which correspond to CPU cores (that makes the flops line up even more closely). It depends on what the goal of the comparison is.
100%. If you ever tried to build pytorch (or tensorflow) gpu accellerated using NVidia vs AMD it was absolutely chalk and cheese for the longest time.
Nvidia/CUDA process: Download package. Run the build. It works. Run your thing- it's GPU accelerated. Go get a beer/coffee/whatever while your net runs.
AMD process: Download package. Run the build. Debug failure. Read lots of articles about which patches to apply. Apply the patches. Run the build. It fails again. Shit. OK ok now I know what to do. I need a special fork of the package. go get that. Find it doesn't actually have the same API that the latest pytorch/tf relies on. OK downgrade those to an earlier version. OK now we're good. Run the build again. Aw shit. That failed again. More web searches. Oh ok now I know - there's some other patches you need to apply to this branch. OK cool. Now it compiles. Install the package. Run the thing. Huh. That's weird. GPU accelleration isn't on.... sigh....
I think this undersells the story. NVidia's success is built on innovating and scaling for 20 years. Vastly oversimplifying it:
- CUDA conception in 2006 to build super computers for scientific computing.
- CUDA influencing CPU designs to be dual purpose, with major distinction of RAM amounts (for scientific compute you need a lot more RAM compared to gaming)
- Crypto craze driving extreme consumer GPU demand which enabled them to invest heavily into RND and scale up production.
- AI workload explosion arriving right as the crypto demand was dying down.
- Consistently great execution, or at least not making any major blunders, during all of the above.
It’s more no major blunders + no real major competition, they have not consistently “executed great”, it’s dumb luck + no extremely stupid decisions.
It doesn’t mean they didn’t make a bunch of mistakes its that when they did there was no competition to to realistically turn towards, and they fixed a lot of their mistakes.
As someone who's intimately familiar with both cultures, I'm convinced Intel would have killed any innovation Nvidia had going for it before it had a chance to really take off.
Management at the two could not be more opposite if they tried.
Intels 3rd wave management by MBA would have ruined nvidia way back then forcing them into something dumb and blocked all of the R&D they did since since that's what they also did at intel.
Other companies are capable of making big GPUs; they aren't the only TSMC customer. Intel themselves have perfectly fine GPUs. Their issue is that their management never allocates enough space in their chips to let them perform well.
Nvidia's advantage is that they have by far the most complete programming ecosystem for them. (Also honestly… they're a meme stock.)
LLMs absolutely did not emerge because of NVIDIA. you’re the one imho who is mistaking correlation with causality.
The first transformer models were developed at Google. NVIDIA were the card du jour for accelerating it in the years since and have contributed research too, but your statement goes way too far
I was around the space before Alexnet came out. Without NVidia the last 15 years of AI would not have happened.
Deep learning was only possible because you could do it on NVidia cards with Cuda without having to use the big machines in the basement.
Trying to convince anyone that neural networks could be useful in 2009 was impossible - I got a grant declined and my PhD supervisor told me to drop the useless tech and focus on something better like support vector machines.
Just like how ROCm is supposed to be competitive today and it isn't unless you have an army of grad students to look after your data center cards.
I tried using AMD Stream and it lacked documentation, debugging information and most of the tools needed to get anything done without a large team of experts. NVidia by comparison could - and was - used by single grad students on their franken-stations which we build out of gaming GPUs.
The less we talk about the disaster that the move to opencl was the better.
Not with Rocm since I’ve moved my personal stack to NVIDIA (for rendering) and Macs for day to day use.
I did write quite a bit of OpenCL prior to that on Intel/AMd/NVIDIA, both for training and for general rendering though, and did some work with Stream before then.
Was it OpenCL1? That's the only one I hadn't tried out for AMD GPUs. Everything else I have and can say with absolute certainty that you spend more time fighting the hardware than you did writing code.
Both 1 and 2. I haven’t done much with 3 as OpenCL is effectively a dead api at this point.
1 was definitely a lot easier to work with than 2. CUDA is easier than both but I don’t think I hit anything I could do in CUDA that I couldn’t do in OpenCL, though CUDA of course had a larger ecosystem of existing libraries.
I was not a PhD, but I studied both around that period in university and can confirm that neural networks were only seen as a kind of theoretical plaything. All the libraries were pushing SVMs.
I think their point is that easily accessible hardware capable of supporting the research is the reason the research has come as far as it has, and I would tend to agree. At the very least, GPUs keeping the PCI ecosystem going has played a major role in allowing the specialized accelerator market to flourish.
But that could apply to any of the GPU manufacturers. CUDA made for an easier ecosystem but if it didn’t exist it would have been any of the other APIs.
The first transformer models didn’t even use CUDA and CUDA didn’t have mass ecosystem inroads till years later.
I’m not trying to downplay NVIDIA but they specifically mentioned cause and effect, and then said it was because of NVIDIA.
First transformer != LLM. Imagine a world where you had to use AMD or CPUs, there would be no AlexNet, there would be no LLMs. Nvidia seeded universities with gifted hardware accelerators for a over a decade. Nvidia built the foundations for modern ML on which transformer lies, it's just one step on a long road to LLMs.
AMD had GPU compute APIs at the time as well. They also used to contribute GPUs (albeit in much smaller quantities). They just ended up killing them in favor of OpenCL which then withered on the vine.
NVIDIA absolutely contributed to the foundation but they are not the foundation alone.
Alexnet was great research but they could have done the same on other vendors at the time too. The hardware didn’t exist in a vacuum.
OpenCL itself is kinda fine, but the libraries never existed because every academic from postdoc up could get a free NVidia card. You literally filled a form out and they sent you it.
People will say Vulkan but it has the same level of adoption as OpenCL, and has the same issue that it competes against vendor specific APIs (DX and Metal) that are just better to use. It’s still used though of course as a translation target but imho that doesn’t qualify it as a success.
OpenCL was and is a failure of grand magnitude. As was colada.
Please, this is just revisionism. There’s nothing inherent to AlexNet that relied on NVIDIA hardware or even software. It was just what happened to be available and most straightforward at the time.
To say it wouldn’t have been possible on AMD is ludicrous and there is a pattern to your comments where you dismiss any other companies efforts or capabilities, but are quite happy to lay all the laurels on NVIDIA.
The reality is that multiple companies and individuals got us to where we are, and multiple products could have done the same. That's not to take away from NVIDIA's success, it's well earned, but if you took them out of the equation, there's nothing that would have prevented the tech existing.
> It was just what happened to be available and most straightforward at the time
AMD made better hardware for a while and people wanted OpenCL to succeed. The reason why nvidia became dominant was because their competitors simply weren’t good enough for general purpose parallel compute.
Would AI still have happened without CUDA? Almost certainly. However nvidia still had a massive role in shaping what it looks like today.
That’s my take away too, even though I’m invested myself. The discussion around them has transcended into mythology. Imho part of it is because GPUs themselves are so foreign to many people as a development target and so they’re not really as aware of the landscape
> The first transformer models didn’t even use CUDA and CUDA didn’t have mass ecosystem inroads till years later.
I graduated college in 2010 and I took a class taught in CUDA before graduating. CUDA was a primary driver of NN research at the time. Sure, other tools were available, but CUDA allowed people to build and distribute actually useful software which further encouraged the space.
Could things have happened without it? Yeah, for sure, but it would have taken a good deal longer.
BTW, when GPUs were used for Bitcoin mining (up until year 2013, obsoleted by spacialized chips after that), AMD chips were used exclusively, because they had much better integer math performance, compared to Nvidia cards, which focused on floating point performance.
Nvidia will make general purpose GPUs while the AI players will make ASICS. I guess most data center will prefer GPUs so customers can run whatever AI model they need.
Indeed. A friend of a friend of a friend was in the room when the decision was made. The reason for not buying: Huang won't be a team player and is hard to work with.
So yes, they would have tried to integrate it into the rest of Intel.
Just look at what happened to Altera under Intel. Preacquisition it was similar in size to Xilinx. Now it's just a shadow of its former self, latest few quarters shows it in the red. The person they have in charge of it. Sandra Rivera was formerly the head of HR.
From 2019 to 2021, Rivera was Intel’s chief people officer, leading the company’s Human Resources organization worldwide.
I heard a rumor that Jensen wouldn't agree to the acquisition unless he became CEO of the combined entity.
Intel of that era would have also required Nvidia to only fab their chips with Intel. That would have been fine initially when Intel was a process leader. But it'd have killed Nvidia for the last 10 years. AMD would have enjoyed a consistent process advantage over Nvidia.