Hacker Newsnew | past | comments | ask | show | jobs | submit | more Moosdijk's commentslogin

Interesting. Instead of running the model once (flash) or multiple times (thinking/pro) in its entirety, this approach seems to apply the same principle within one run, looping back internally.

Instead of big models that “brute force” the right answer by knowing a lot of possible outcomes, this model seems to come to results with less knowledge but more wisdom.

Kind of like having a database of most possible frames in a video game and blending between them instead of rendering the scene.


Isn’t this in a sense an RNN built out of a slice of an LLM? Which if true means it might have the same drawbacks, namely slowness to train but also benefits such as an endless context window (in theory)


It's sort of an RNN, but it's also basically a transformer with shared layer weights. Each step is equivalent to one transformer layer, the computation for n steps is the same as the computation for a transformer with n layers.

The notion of context window applies to the sequence, it doesn't really affect that, each iteration sees and attends over the whole sequence.


Thanks, this was helpful! Reading the seminal paper[0] on Universal Transformers also gave some insights:

> UTs combine the parallelizability and global receptive field of feed-forward sequence models like the Transformer with the recurrent inductive bias of RNNs.

Very interesting, it seems to be an “old” architecture that is only now being leveraged to a promising extent. Curious what made it an active area (with the works of Samsung and Sapient and now this one), perhaps diminishing returns on regular transformers?

0: https://arxiv.org/abs/1807.03819


> Instead of running the model once (flash) or multiple times (thinking/pro) in its entirety

I'm not sure what you mean here, but there isn't a difference in the number of times a model runs during inference.


I meant going to the likeliest output (flash) or (iteratively) generating multiple outputs and (iteratively) choosing the best one (thinking/pro)


That's not how these models work.

Thinking models produce thinking tokens to reason out the answer.


RMS = Richard Stallman, responsible for the GNU project and the free software foundation.

He had a page dedicated to his housing situation:

https://stallman.org/seeking-housing.html



Do you have a log available somewhere?


I keep everything in my self hosted gitea. Just made it public.

https://gitter.swolereport.com/robviren/cspace


Thanks, I’ll check it out

Edit: timed out


Reminds me of https://github.com/RobViren/kvoicewalk where people take voice clips and train a text to speech using random walks.

Not related, misguided methods :D


Well, it’s the same author so it is kind of related.


I’m in this one because it was at the top of the front page.



I never really looked at it that way, but I think you're right. Although, non-European-owned companies aren't necessarily incentivized to look towards European companies. Looking towards your European neighbors mostly comes down to logistical situations. In those sectors, multilingual services are more common.


I'm hoping you'll open the API some time in the future. This would be great for diy installations with a esp32 hub.


No issues here on iPhone 12 running iOS 18.6.2 and Firefox 143.2 (62218)


The orbiting sensitivity is a bit high when zoomed in a lot, which can lead to the model spinning out of control, as the other user mentioned.

Still manageable though, just very sensitive.


>Here's a hot take: Name and Shame.

That's easier said than done, hence why Stefano probably didn't.


My experience with Gemini is the sole reason I am convinced that there's an AI hype going on. It consistently hallucinates key information which has led me to spend countless hours tracking down which information the output was based on, only to find that it dreamt up the facts that it gave to me.

The way I have come to perceive AI is that it's mostly good at reassuring/reaffirming people's beliefs and ideas than an actual source of truth.

That would not be an issue if it was actually marketed as such, but seeing the "guided learning" function fail time and again makes me think we should be a lot more critical of what we're being told by tech enthusiasts/companies about AI.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: