Getting ROCm working was... an adventure. We documented the entire (painful) journey in a detailed blog post because honestly, nothing went according to plan. If you've ever wrestled with ROCm setup for ML, you'll probably relate to our struggles.
The good news? Everything works smoothly now! We'd love for you to try it out and see what you think.
Reading your post now, half the article feels like it is just installing PyTorch. Next time, just use the pre-built docker containers. It is the recommended way and much easier.
Additionally, our MI300x VMs/machines come with ROCm installed and configured already. We also apply all the default recommended BIOS settings as well.
Hey,
I think the major issue was getting torch 2.7 working on a bare-metal installation as the app needed that and the pre-built docker containers aren't out yet for torch 2.7. Transformer Lab is also much more than just the Pytorch setup as it provides several plugins which can be used for training, evaluation, dataset generation (and much more!).
Also, the pre-built docker containers unfortunately do not work on WSL which caused the majority of the issues.
I'd love to hear if you had a different experience or if I'm mistaken in any of this!
No, this is great feedback. I don't think you're mistaken. It is actually curious to me why PyTorch 2.7 isn't available yet, it should be! I'll pass that feedback up to AMD.
As for WSL, that kind of makes sense, since they just added Windows support to ROCm and that is probably a work in progress.
No need to build your own box, we've got 1xMI300x VMs, for FREE (thanks to AMD), for development exactly like this. Reach out and we can get you set up.
Someone left a comment accusing me of advertising my business, then deleted it. If that’s how it came across, I apologize, but my intention was to offer something genuinely useful, for free, and directly relevant to helping the OP. Those credits weren’t easy to get, it required going all the way up to Lisa. I’m committed to making supercompute accessible to developers. Yes, it’s free like GitHub is free. But this isn’t a sales pitch.
One of the maintainers here. Yes this is a big part of our plans. In addition to our plugin system which allows arbitrary Python scripts, we will soon publish how to add decorators to any existing script which can be run externally but logged into Transformer Lab. So you could do training anywhere but trigger evals in the app, for example.
Apple MLX is a game changer for what is possible in Local LLM development for everyone. Getting this to work as a single application that "just works" across platforms has been one of the hardest engineering problems we've ever worked on, but we're determined to get it right.
The functionality in Transformer Lab comes from plugins. Plugins are just Python scripts behind the scenes. So anything that can be done in Python can be done as a plugin.
Right now we have export plugins for going to GGUF, MLX and LlamaFile but if you know a good library for exporting to TensorRT, let's make a plugin for this! (Feel free to join our Discord if you want help)
The good news? Everything works smoothly now! We'd love for you to try it out and see what you think.