> To learn, agents must experience high-value states, which are hard (or impossi...

algo_trader · 2025-12-31T17:56:56 1767203816

This is less about masked modelling and more about reverse-curriculum.

e.g. DeepCubeA 2019 (!) paper to solve Rubik cube.

Start with solved state and teach the network successively harder states. This is so "obvious" and "unhelpful in real domains" that perhaps they havent heard of this paper.

larrydag · 2025-12-31T17:02:42 1767200562

perhaps I'm missing something. Why not start the learning at a later state?

LatencyKills · 2025-12-31T17:15:07 1767201307

If the goal is to achieve end-to-end learning that would be cheating.

If you sat down to solve a problem you’ve never seen before you wouldn’t even know what a valid “later state” looking like.

taeric · 2026-01-01T04:04:12 1767240252

Why is it cheating? We literally teach sports this way? Often times you teach sports by learning in scaled down scenarios. I see no reason this should be different.

LatencyKills · 2026-01-03T09:52:44 1767433964

If the goal is to learn how to solve a Rubik's Cube when you've never seen a Rubik's Cube before, you have no idea what "halfway solved" even looks like.

This is precisely how RL worked for learning Atari games: you don't start with the game halfway solved and then claim the AI solved the end-to-end problem on its own.

The goal in these scenarios is for the machine to solve the problem with no prior information.

taeric · 2026-01-03T21:17:15 1767475035

This isn't accurate, though? Halfway solved, for most teachings, is to have the first layer solved.

Indeed, this is a key to teaching people to know how to advance. Do not focus on a side, but learn to advance a layer.

bob1029 · 2025-12-31T17:12:40 1767201160

That's effectively what you get in either case. With MLM, on the first learning iteration you might only mask exactly one token per sequence. This is equivalent to starting learning at a later state. The direction of the curriculum flows toward more and more of these being masked over time, which is equivalent to starting from earlier and earlier states. Eventually, you mask 100% of the sequence and you are starting from zero.