Hacker Newsnew | past | comments | ask | show | jobs | submit | alyxya's commentslogin

This looks extremely impressive, really deserves more attention here.

Are the inverse dynamics and forward dynamics models trained separately? It sounds like if the inverse dynamics model is meant to extrapolate more training data, then perhaps all that means is it takes very little data to generalize directly with the forward dynamics model assuming the right architecture.


thanks! the inverse dynamics model is trained first on 40k hours of data and then frozen to label all 11 million hours. yup! the idea is that it should take a small amount of data to generalize environment dynamics, then you can use a lot of data to understand actions.

The interesting thing to me about their world models is that it's like a static point cloud model, as opposed to deepmind's genie 3 which feels more like a video generation model at its base. I see the video generation base as generally superior but far more expensive. The current approach for world labs is likely based on the expertise of the founders, but I don't see how it can scale and match what genie 3 does even if it does offer state persistence as an advantage.

Yes, Marble (from World Labs) feels like it's generating Gaussian Splats or similar. I guess it's more compatible and easier to use for 3d asset generation and reusing in other software. Very exciting times ahead!

I read that they only have 30 people maintaining the whole app. It must be difficult holding everything together with so few people, and probably insufficient redundancy with oncall.

Yes, the head of product dev recently bragged about only having 30 engineers.

Sometimes more people in the kitchen is worse

From what I understand, they effectively rewrote all Scala/JVM/Python services to go through the xAI engine for their feed:

https://www.youtube.com/watch?v=-8JOlCvA4Qs

I am not sure how that works at scale or how much money that burns, OR if this is even true, but a Rust monolith of this kind, where everything is a Black Box, could very will be maintained by 30 people.

That said, X is a complete dumpster fire where I want to pour Clorox into my eyes after 30 seconds.

I miss being plugged into a 24/7 feed of professional reporters, and at the same time - thank you, Elon, for fixing my addiction.


How may does GitHub have? its down far more than x.com

The best and proven linear attention is the Gated DeltaNet or variations of it, used by Kimi and Qwen. Anyone who thinks linear attention can't work is forgetting that models are a fixed size so attention should always be compressable to be linear. Another way to think of the feasibility of linear attention is that the standard attention mechanism can be made linear simply by removing the softmax so the kv cache stores the kv product as a constant size matrix instead. Softmax just normalizes attention, but it's not theoretically required.


I think the standard way to convert repeating decimals or decimals that appear to have a certain repeating pattern to fractions is to take the first repeating period and divide by 0.999.. with the number of 9s matching the period length. 0.163272727.. = 0.163+0.00027/0.99 = 163/1000+27/99000 = 449/2750


(This works because x/9 = 0.xxxx..., xy/99 = 0.xyxyxy... and so on). And that is true intuitively because when you long divide in order to get a repeating pattern you need the remainder to be the same as what you started with. I.e if you long divide

       0.n
      -----
    a| b.0
You need 10b - an = b which implies 9b = an. If a = 9 (i.e. your divisor is of the form 10^n - 1, then b=n and you not only have a repeating pattern but you repeat digits.

Or going the other way, if d = 10^n - 1 then [10 a = a (mod d)] so your remainders never change. And then note that

  a * 10^n = a * (10^n - 1) + a
so your quotient is just `a` as well.


Unlike most improvements to LLMs that modify the architecture or optimizer or something about the model, this paper discusses a novel technique that relies on some external lookup table in the forward pass computation, with the external lookup happening in parallel with some of the compute. It's a really interesting idea with a lot of cool engineering work behind it, but it looks too convoluted without improvements that could justify the complexity.


Given the decrease in the benchmark score from the correction, I don't think you can assume they didn't check a single output. Clearly the model is still very capable and the model cheating its results didn't affect most of the benchmark.


I don't get the anti-LLM sentiment because plenty of trends continue to show steady progress with LLMs over time. Sure, you can poke at some dumb things LLMs do as evidence of some fundamental issue, but the frontier capabilities continue to amaze people. I suspect the anti-LLM sentiment comes from people who haven't given a serious chance at seeing all the things they're capable of for themselves. I used to be skeptical, but I've changed my mind quite a bit over the past year, and there are many others who've changed their stance towards LLMs as well.


Or, people who've actually trained and used models in domains where "stuff on the internet" is of no relevance to what you are actually doing realize the profound limitations to what these LLMs actually do. They are amazing, don't get me wrong, but not so amazing in many specific contexts.


People who think that "steady progress" will continue forever have no basis for their assumption.

You have a ad-hominem attack and your own personal anecdote with, which are not an argument for LLMs.


It'll steadily continue the same way Moore's law has continued for a while. I don't think people question the general trend in Moore's law besides the point where it's nearing the limit of physics. It's a lot harder to claim LLMs don't work as a universal claim, whereas claiming something is possible for LLMs only needs some evidence.


Yes, LLMs will continue to progress until they hit the limits of LLMs.

The idea that LLMs will reach AGI is entirely speculative, not least because AGI is undefined and speculative.


Lecun has already been proven wrong countless times over the years regarding his predictions of what LLMs can or cannot do. While LLMs continue to improve, he has yet to produce anything of practical value from his research. The salt is palpable, and for this he's memed for a reason.


I think the most neutral solution right now is having multiple competing models as different perspectives. We already see this effect in social media algorithms amplifying certain biases and perspectives depending on the platform.


I don’t think the two kinds of vibe coding are entirely separate. There’s a spectrum of how much context you care to understand yourself, and it’s feasible to ask a lot of questions to gain more understanding or let loose and give more discretion to the LLM.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: