Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As opposed to inference (like generating text and images), training requires some more math (fp16 or bf16) and a single CPU generally won't cut it.

The prepare/train/generate instructions in the github linked are pretty much it for the 'how' of training a model. You give it a task and it does it for 1 billion trillion epochs and saves the changes incrementally (or not).

Training a LoRA for an image model may be more approachable, there's more blog entries etc on this, and the process is largely similar, except you're doing it for a single slice instead of the whole network.

[edit] I'm also learning so correct me if I'm off, hn!



> You give it a task and it does it for 1 billion trillion epochs and saves the changes incrementally (or not).

Somewhat confusingly, big LLM are most just trained for 1 epoch afaik.


I've seen 3 epochs on some of the finetuning R1 blog posts. It's not my field so not sure how valid that is.


Yeah, fine-tuning is different from pretraining




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: