Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm also curious about the potential speed gains in automatic differentiation, as there are way less branches to 'go up'. Or am I wrong here?


They actually use a relu to represent the model weights. But I'm not convinced that this can't be avoided. We do gradient boosted decision tree training without this trick.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: