Interesting.
Instead of running the model once (flash) or multiple times (thinking/pro) in its entirety, this approach seems to apply the same principle within one run, looping back internally.
Instead of big models that “brute force” the right answer by knowing a lot of possible outcomes, this model seems to come to results with less knowledge but more wisdom.
Kind of like having a database of most possible frames in a video game and blending between them instead of rendering the scene.
Isn’t this in a sense an RNN built out of a slice of an LLM? Which if true means it might have the same drawbacks, namely slowness to train but also benefits such as an endless context window (in theory)
It's sort of an RNN, but it's also basically a transformer with shared layer weights. Each step is equivalent to one transformer layer, the computation for n steps is the same as the computation for a transformer with n layers.
The notion of context window applies to the sequence, it doesn't really affect that, each iteration sees and attends over the whole sequence.
Thanks, this was helpful! Reading the seminal paper[0] on Universal Transformers also gave some insights:
> UTs combine the parallelizability and global receptive field of feed-forward sequence models like the Transformer with the recurrent inductive bias of RNNs.
Very interesting, it seems to be an “old” architecture that is only now being leveraged to a promising extent. Curious what made it an active area (with the works of Samsung and Sapient and now this one), perhaps diminishing returns on regular transformers?
I never really looked at it that way, but I think you're right.
Although, non-European-owned companies aren't necessarily incentivized to look towards European companies.
Looking towards your European neighbors mostly comes down to logistical situations. In those sectors, multilingual services are more common.
My experience with Gemini is the sole reason I am convinced that there's an AI hype going on. It consistently hallucinates key information which has led me to spend countless hours tracking down which information the output was based on, only to find that it dreamt up the facts that it gave to me.
The way I have come to perceive AI is that it's mostly good at reassuring/reaffirming people's beliefs and ideas than an actual source of truth.
That would not be an issue if it was actually marketed as such, but seeing the "guided learning" function fail time and again makes me think we should be a lot more critical of what we're being told by tech enthusiasts/companies about AI.
Instead of big models that “brute force” the right answer by knowing a lot of possible outcomes, this model seems to come to results with less knowledge but more wisdom.
Kind of like having a database of most possible frames in a video game and blending between them instead of rendering the scene.