Hacker Newsnew | past | comments | ask | show | jobs | submit | nothrowaways's commentslogin

> Principal component analysis of 200 GPT2, 500 Vision Transformers, 50 LLaMA- 8B, and 8 Flan-T5 models reveals consistent sharp spectral decay - strong evidence that a small number of weight directions capture dominant variance despite vast differences in training data, objectives, and initialization.

Isn't it obvious?


Well intuitively it makes sense that within each independent model, a small number of weights / parameters are very dominant, but it’s still super interesting that these can be swapped between all the models without loss of performance.

It isn’t obvious that these parameters are universal across all models.


This general idea shows up all over the place though. If you do 3D scans on thousands of mammal skulls, you'll find that a few PCs account for the vast majority of the variance. If you do frequency domain analysis of various physiological signals...same thing. Ditto for many, many other natural phenomena in the world. Interesting (maybe not surprising?) to see it in artificial phenomena as well

It's almost an artifact of PCA. You'll find "important" principal components everywhere you look. It takes real effort to construct a dataset where you don't. That doesn't mean though, for instance, that throwing away the less important principal components of an image is the best way to compress an image.

Not really. If the models are trained on different dataset - like one ViT trained on satellite images and another on medical X-rays - one would expect their parameters, which were randomly initialized to be completely different or even orthogonal.

Now I wonder how much this "Universal Subspace" corresponds to the same set of scraped Reddit posts and pirated books that apparently all the bigcorps used for model training. Is it 'universal' because it's universal, or because the same book-pirating torrents got reused all over?

Every vision task needs edge/contrast/color detectors and these should be mostly the same across ViTs, needing only a rotation and scaling in the subspace. Likewise with language tasks and encoding the basic rules of language which are the same regardless of application. So it is no surprise to see intra-modality shared variation.

The surprising thing is inter-modality shared variation. I wouldn't have bet against it but I also wouldn't have guessed it.

I would like to see model interpretability work into whether these subspace vectors can be interpreted as low level or high level abstractions. Are they picking up low level "edge detectors" that are somehow invariant to modality (if so, why?) or are they picking up higher level concepts like distance vs. closeness?


It hints there may be common higher-level abstraction and compression processes in human consciousness.

The "human" part of that matters. This is all human-made data, collected from human technology, which was created to assist human thinking and experience.

So I wonder if this isn't so much about universals or Platonic ideals. More that we're starting to see the outlines of the shapes that define - perhaps constrict - our own minds.


What if all models are secretly just fine tunes of llama?

Where do they get the video training data?

From the paper:

> Datasets. We construct a diverse and high-quality collection of video datasets to train STARFlow-V. Specifically, we leverage the high-quality subset of Panda (Chen et al., 2024b) mixed with an in-house stock video dataset, with a total number of 70M text-video pairs.


> in-house stock video dataset

Wonder if "iCloud backups" would be counted as "stock video" there? ;)


I have to delete as many videos as humanly possible before backing up to avoid blowing through my iCloud storage quota so I guess I’m safe

More likely AppleTV shows


Turn on advanced data protection so they don't train on yours.

That has nothing to do with it, and Apple wouldn’t train on user content, they’re not Google. If they ever did there would be opt in at best. There’s a reason they’re walking and observing, not running and trying to be the forefront cloud AI leader, like some others.

Why should I buy this "ethical Apple" argument?

They shared audio Siri recordings with contractors in 2019. It became opt-in only after backlash, similar to other privacy controversies.

This shows that they clearly prioritize not being sued or caught, which is slightly different from prioritizing user choices.


It is interesting to see the consensus that nobody is enthusiastic about meta Ray-Bans except Zuckerberg.

It's creepy.


The only real usage I've seen is on Instagram reels etc. where people are using them in red light districts like in Amsterdam to film the women.


I've seen a plumber use it to document a repair that he was doing. Being able to record in tight spaces seems to be a good use case for this tech

I've also seen a home inspector use them to document issues with a new construction

There's also a ton of people using it for cooking videos


I have them and like them. I don't wear them constantly, but on days when I'm doing something interesting, they help me document much more than I otherwise would.


3 similar apps already! Apple and big tech UI designers should read this thread.


"Invest in my startup"


Before the music stops


Python is quickly turning into a crowded keyword junkyard


Python has about 40 keywords, I say I would regularly use about 30, and irregularly use about another 5. Hardly seems like a "junkyard".

Further, this lack of first class support for lazy importing has spawned multiple CPython forks that implement their own lazy importing or a modified version of the prior rejected PEP 690. Reducing the real world need for forks seems worth the price of one keyword.


For those curious here are the actual keywords (from https://docs.python.org/3/reference/lexical_analysis.html?ut... )

Hard Keywords:

False await else import pass None break except in raise True class finally is return and continue for lambda try as def from nonlocal while assert del global not with async elif if or yield

Soft Keywords:

match case _ type

I think nonlocal/global are the only hard keywords I now barely use, for the soft ones I rarely use pattern matching, so 5 seems like a good estimate


I recall when they added "async" and it broken a whole lot of libraries. I hope they never again introduce new "hard" keywords.


Removing "print" in 3.0 helped their case significantly, as well.


From the PEP (https://peps.python.org/pep-0810/):

> The choice to introduce a new `lazy` keyword reflects the need for explicit syntax. Lazy imports have different semantics from normal imports: errors and side effects occur at first use rather than at the import statement. This semantic difference makes it critical that laziness is visible at the import site itself, not hidden in global configuration or distant module-level declarations. The lazy keyword provides local reasoning about import behavior, avoiding the need to search elsewhere in the code to understand whether an import is deferred. The rest of the import semantics remain unchanged: the same import machinery, module finding, and loading mechanisms are used.

This functionality is highly desired, and it does appear to actually need a new (soft) keyword. Sorry you don't like it.


The pep didn’t mention considering reusing `async` instead of `lazy`. That would’ve conveyed the same thing to me without a new keyword, and would haven’t been similar to html’s usage `async`.


I personally would have preferred "defer import os" instead of "lazy import os". It might be the non-native showing but lazy import feels unserious.



"Lazy" is standard language for this kind of behavior.


It is a 'soft keyword' as the PEP explains. I would not think that this has any major impact on anyone who just chooses to ignore this feature. Assuming that you want this behavior, I wonder how this could have been done in a better fashion without now having 'lazy' in the specific context of an import statement.


soft keyword for anyone not familiar like I was ...

"A new soft keyword lazy is added. A soft keyword is a context-sensitive keyword that only has special meaning in specific grammatical contexts; elsewhere it can be used as a regular identifier (e.g., as a variable name). The lazy keyword only has special meaning when it appears before import statements..."


> Python is quickly turning into a crowded keyword junkyard

* Javascript (ECMAScript) has 63 keywords. * Rust has 50 keywords. * Java has 51 keywords + 17 contextually reserved words, for a total of 68. * Python has now 36 keywords + 4 'soft' keywords, for a total of 40. * Go has 25 keywords.


They left out X because he will bitch_ about it to his 600 million followers lol.


Interesting times


Did you read the article? X was first to be accused by the EU Commission.


Hi X


Does speed really matter during python installation?


Speed matters everywhere. How much compute is spent on things that could easily be 100x faster than they are? Compare using VMware with pip to run a battery of unit tests with firecracker plus uv. It’s orders of magnitude quicker, and avoids a whole suite of issues related to persistent state on the machine


Possibly for some workflows, though personally I find the emphasis on speed baffling and a big part of the reason I don’t find most of these uv testimonials credible. I’m a regular python user across multiple environments and I’ve never considered waiting for pip to be a material part of my time, it’s trivial to the point of being irrelevant. The fact that so many people come out of the woodwork to talk about how fast it is, means either there’s some big group somewhere with a niche use case that gets them bogged down in pip dependency resolving or whatever gets sped up (obviously the actual downloading can’t be faster) or it’s just a talking point that (presumably) rust zealots who don’t actually use python arrive with en mass, but it’s honestly an extremely ineffective way of promoting the product to most python users who don’t have speed of package installation as anything close to a pain point.


Yes. Technical excellence is a virtue in and of itself.


This! I'm tired of the constant calls to be as mediocre as we can get away with, in the name of getting things done faster and cheaper.


It's fast enough that sometimes dependencies can be checked and resolved and installed at program runtime rather than it needing to be a separate step.

You can go from no virtual environment, and just "uv run myfile.py" and it does everything that's needed, nearly instantly.


On my system, Pip takes noticeable time just to start up without ultimately doing anything of importance:

  $ time pip install
  ERROR: You must give at least one requirement to install (see "pip help install")

  real 0m0.356s
  user 0m0.322s
  sys 0m0.036s
(Huh, that's a slight improvement from before; I guess pip 25.3 is a bit better streamlined.)


lol who is using pip so much that .36s of startup time matters to them? This, if presumably uv can do nothing slightly faster, is an absolutely meaningless benefit


>who is using pip so much that .36s of startup time matters to them?

https://danluu.com/productivity-velocity

https://danluu.com/input-lag/


In general, whenever you introduce a cache to make software faster (along any dimension), you have to think about cache invalidation and eviction. If your software is fast enough to not need caching, this problem goes away.


It's funny because superior caching is also highly relevant to uv's outperformance. (But invalidation/eviction isn't generally a real problem for a cache of installed packages; the cache can be cleaned up whenever and just rebuilt , and the cache has a separate entry per version of a library, where each version is immutable.)


Agreed -- the dream of caching is immutability.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: