I'm also curious about the potential speed gains in automatic differentiation, a... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		sva_ on Feb 28, 2024 \| parent \| context \| favorite \| on: The Era of 1-bit LLMs: ternary parameters for cost... I'm also curious about the potential speed gains in automatic differentiation, as there are way less branches to 'go up'. Or am I wrong here?

lumost on Feb 28, 2024 [–]

They actually use a relu to represent the model weights. But I'm not convinced that this can't be avoided. We do gradient boosted decision tree training without this trick.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact