Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

CUDA and ROCm work under Pytorch. If ROCm does not work well Pytorch does not work well.

Nvidia has multiple advantages.

1. Same software written in PyTorch works across all relatively modern Nvidia chips. With AMD and ROCm that's not the case.

2. M̶I̶3̶0̶0̶x̶ h̶a̶s̶ n̶o̶ F̶P̶8̶ s̶u̶p̶p̶o̶r̶t̶.

3. Comparisons against H100 are always like this:

  8x AMD MI300X (192GB, 750W) GPU  
  8x H100 SXM5 (80GB, 700W) GPU
The fair comparison would be against

  8x H100 NVL (188GB, <800W) GPU 
And that they never do.

4. H100 is 21 month old architecture. MI300X is 7 months. Nvidia moved into new architecture every year pace. AMD is generation behind and must step up the pace. B100 comes out this year.

AMD is getting closer, but don't except them to catch Nvidia in not time.



It has fp8 support. Not sure whether fp8 on MI300x is supported by vLLM yet.

Also, many of these comparisons use vLLM for both setups, but for Nvidia you can and should use TensorRT-LLM which tends to do quite a bit better than vLLM at high loads.


Elio, the person who did the testing confirmed with me that he has fp8 working.


I haven't ever seen a server with 8x H100 NVL 188GB. The H100 NVL has 94GB of VRAM but they sell them in pairs connected with NVLink, so I guess they sometimes market them as 188GB, but in fact it's two cards and a server usually has 4 pairs.


> MI300X is 7 months.

Less than that, we paid for ours in January and received it in March. The first batch had problems and we had to send them back, which took another 3 weeks. So, let's consider the start date closer to April.

~3 months.


I thought H100 NVL had 96GB




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: