Hacker Newsnew | past | comments | ask | show | jobs | submit | neduma's commentslogin

Awesome.


Interesting.


More details and implementation notes please?


It's on the page, if you click the little info icon in the upper-right. Here's the text but there's some nice graphics there too:

  Snake Game, training entirely in the browser. Built on tinygrad: the rollout / targets / train graphs are TinyJits authored in Python, then compiled once to WGSL and replayed here under WebGPU.

  Observation: flat 10×10 board (100) + 4-dim prev-action one-hot = 104 dims. fc_pi.weight is zero-init so the opening policy is uniform over the legal actions; fc_v uses tinygrad's default Kaiming init.

  Per rollout: T=24 × N=384 parallel snakes (9,216 transitions), then K=3 epochs × 4 mini-batches of PPO updates. GAE γ=0.99, λ=0.95; AdamW wd=0.01; ratio clip ε=0.1; grad-norm 0.5; Huber value β=1, val_coef=1; entropy bonus 0.008333333333333333.

  Action mask + value clip + KL early stop. The 4-dim prev_a obs tail lets fc_pi zero the U-turn logit (the env silently overrides same-axis reversals anyway). Value loss is max(huber(v_new−td), huber(v_clip−td)) at ε=0.2. Approx-KL is sampled after each epoch and breaks the loop at 1.5·kl_target.


> got it printed onto a canvas to hang

How, Care to share the steps. I'm thinking of the same thing.



Add lot of cache. Kind of slow.


I was listening to story of QualComm in Acquired podcast. This is highly related.


The whole AI paradigm is shifting the tide towards known unknown from unknown unknowns. At least, it feels like it.


The question is how many of those "knowns" are hallucinated


Funny, it also keeps all compliance and standards folks in business.


check out Kali Linux for more ids/ips prepackaged tools.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: