It has more to do with Microsoft deciding to emulate Google and Facebook's surveillance capitalism business model.
If you combine mandatory online user accounts with telemetry and Windows Recall, you have a system for building out advertising profiles linked to known individuals.
Wasn't streaming models from storage into limited memory a case where it was impressive that you could make the elephant dance at all?
If you want to get usable speeds from very large models that haven't been quantitized to death on local machines, RDMA over Thunderbolt enables that use case.
Consumer PC GPUs don't have enough RAM, enterprise GPUs that can handle the load very well are obscenely expensive, Strix Halo tops out at 128 Gigs of RAM and is limited on Thunderbolt ports.
The bad performance you saw was with very limited memory and very large models, so streaming weights from storage was a huge bottleneck. If you gradually increase RAM, more and more of the weights are cached and the speed improves quite a bit, at least until you're running huge contexts and most of the RAM ends up being devoted to that. Is the overall speed "usable"? That's highly subjective, but with local inference it's convenient to run 24x7 and rely on non-interactive use. Of course scaling out via RDMA on Thunderbolt is still there as an option, it's just not the first approach you'd try.
> If you gradually increase RAM, more and more of the weights are cached and the speed improves quite a bit
It'll increase a lot based on the zero-ram baseline. But it's still complete garbage compared to fitting the model in RAM. Even if you fit most of it in RAM you're still probably an order of magnitude slower than fitting all of it in RAM, most of your time spent waiting for your SSD.
The Ultra variants of the M series chips had previously consisted of two of the Max chips bonded together.
The M5 generation Pro and Max chips have moved to a chiplet based architecture, with all the CPU cores on one chiplet, and all the GPU cores on another.
> Apple been tightening that control over time. For a long time on MacOS X you could simply run apps. Then came notarisation, but you could still disable it. Now, even with a certificate, it still shows a dialog.
Notarisation is just proof that the app went through an automated malware scan.
Windows, Mac, and Android have all adopted measures intended to warn and attempt to protect users from malware.
As far as age verification goes, this is a restriction being forced on companies by governments.
Apple previously allowed parents to set age restrictions on their children, or not, as they saw fit.
You have to pay apple 150$ annually for the pleasure of notarisation, even if you make open-source apps. Yet you cannot distribute apps outside store on mobile (besides in eu, but not really, but is't topic on its own…).
yeah they're moving into the wrong direction as well. not to mention that notarisation is for after-the-fact anyways. malware still slips through (historically true!). it's just supposed to shrink the blast radius AFTER apple knows a binary is malware.
what does the scare modal of "are you really sure you wanna run this? could be bad dude..." do?
the only purpose i can see it serving is to push devs to use the AppStore on mac, which is highly restricted in what you can do, and of course, takes 30% of your revenue
People already have the choice between an ecosystem that offers the safety of a walled garden and one that allows the freedom to do anything you like, including shooting yourself in the foot.
Google's decision to walk back the supposed freedom to run anything you like removes user choice from the marketplace and harms consumers.
I'm not sure how future advancements can overcome that issue.
reply