I noted how, when they first started, they talked mostly about developing 'AI accelerators' and it felt mostly like they were talking about big, GPGPU style chips to go head to head with nVidia. Thousands of small SIMD cores doing matrix multiplies, with fast memory and pcie. Maybe something halfway between Cerebras size and and Nvidia Hopper. A tall order but something really needed.
Then at some point it feels like Jim got hooked on the idea of RISC-V everything, and they pivoted their messaging to talking about these more CPU-like chips with a main R-V64 8-wide decoding, state of the art OoO execution, etc. That sounds more like a RISC-V competitor to AMD Zen instead of a competitor to an nVidia GPU.
And they they talk about that just being the interface to the AI chip later but... it really feels like that saw 'hey we can get all this RISC-V stuff for free essentially, and really take over the development of the spec, and that is easier than figuring out how to develop an GP AI chip and a stack that competes with CUDA to go with it, so that is easier to start...'
I'm totally a non-expert though and the preceding is just what I've picked up from watching interviews with Jim (who I just find awesome to listen to).
In the interview he runs down the issues they encountered going down the pure AI accelerator path. It sounds like they've decided the opportunity wasn't there (i.e. too hard) so they've pivoted.
It makes more business sense to have more general purpose hardware that can be pivoted to other applications. Lots of AI ASIC vendors are going to go belly up in the coming years as their platforms fail to attract customers. Carving out a tiny niche with limited demand and no IP moat is very risky in the IC world.
Fast CPU performance is necessary for AI workloads too. You need a fast CPU and combine that with lots of Vector or Tensor processing. Lots of applications need both. They have done both for a while.
Logic is simple inline with reducing power draw from a simplistic instruction set.
Move into a space where we have rapid manufacturing for Specialized chips. Alongside the concept inherit to Nvidia's DPU and you have something very Interesting.
I think you are oversimplifying. You've chosen arbitrary abstraction layer. It's like saying it's just transistors again. Or it's just machine learning. It matters what they can deliver and Jim's track record is best in the whole industry. I think they have a pretty good understanding what's the best approach to bring outstanding results given technology available plus I think currently they want to deliver something to get connected with decent clients and then be able to optimize for actual real world use cases.
I'd love to see chips made out of millions of small computing blocks, neuron alike, without a common clock, with local memory, maybe even with some analog electronics involved. But I'm pretty sure people who are actually working on this kind of stuff could provide me a list of reasons why it's silly (at least given current technology limitations).
sounds like a variation of that old MIT startup called Tilera, very lightweight CPUs with high performance interconnect fabric. at that time I remember thinking it was a solution looking for a problem.
IDK whats wrong with that architecture for AI/ML but I feel too much overhead in full on CPUs. I guess thats where lightweight risc cores come in. Personally what I'd like to see is a clever architecture utilizing a grid architecture with a stack based process for communicating with local nodes using a bare minimal language like forth so extremely light 'nodes' to do matrix math & nothing else.
I think boring can be good, I'm not an expert but 'extra units copy and pasted in a grid' is exactly how I imagine a hardware AI accelerator to be like
I agree, and I've wanted a grid of 1000+ cores for 25 years now, once I realized that the only bottleneck in computing is the bus between CPU and memory back in the late 90s. The only chip that comes even close to what I want for a reasonable price is Apple's M1 line, but they added GPU and AI cores which defeat the purpose.
The churn with GPUs and now AI cores is too much for me. I just can't waste time manually annotating the code I want to run concurrently, on GPU, on AI core, whatever. To me, it looks like everyone is putting 10-100 times more work into their code than what should be needed. I see the same pattern repeated with web development and the explosion of walled garden platforms for mobile and smart devices. So much work for so long for so few results.
Just give me a big dumb boring grid of cores and a self-parallelizing language to program them. Stuff like Julia/Clojure/MATLAB/Erlang/Go come close, but each have poison pills that make reaching the mainstream untenable. Someday I want to write a language that does what I need, but that day will likely never come, because every day is Quantum Leap for me, I just go to work to make rent, pushing the rock up that hill like Sisyphus, only to watch it roll down and have to start all over again. Inadequate tooling has come to totally dominate every aspect of my work, but there may never be time to write better stuff.
We need operating systems designed to make the resources easily accessible across a network. What we are running today are mainframe operating systems where one computer does all the work for concurrent users.
Using plan 9 has taught me that we are far from done designing computers and operating systems - we're trying to build the future on obsolete machines running obsolete operating systems.and it's not going well given all the ugly mutually incompatible nonsense taped and bolted on to hide their age.
Loosely, pretty much all languages today have some form of: mutability, async, special annotations needed for code that should run concurrently or on the GPU, manual memory management, friction around binding to other languages, ugly or unnecessary syntax, the list of poison pills is endless.
I've barely learned portions of Julia, but it does have mutable variables. Once there's a mutable variable, the rest of the language is basically equivalent to transpiling an imperative language like Javascript. As in, there's little advantage over just using higher-order methods within an imperative language, because unexpected behavior can't be prevented, and code can't be statically analyzed beyond where the monad might change.
Clojure uses Lisp's prefix syntax with () with no infix or postfix format available, forcing everyone from a C-style background or even a math background to manually convert their notation to prefix. MATLAB uses 1-indexed arrays, and I don't think that there's a config option to make them 0-indexed. Erlang is too esoteric along a number of design choices to ever be mainstream, although its Actor model is great. Go has trouble with exceptional behavior and doesn't isolate variables between threads, negating most of its usefulness for concurrency.
Basically what I'm looking for is more like a spreadsheet in code, with no imperative behavior whatsoever. I believe that ClojureScript comes closest by executing functionally without side effects, then allowing imperative behavior while suspended, to get/send IO and then execute again. This most closely matches the way the real world works. Unfortunately languages like Haskell go to great lengths to hide the suspended side of things, making it somewhat of a write-only language like Perl (unreadable to outsiders) and too opaque for students to learn in a reasonable amount of time.
The closest language I've encountered that got it almost right was PHP before version 5, when it passed everything by value via copy-on-write and avoided references. Most of the mental load of imperative programming is in reasoning about mutable state. Once classes passed by references arrived, it just became another variation of Javascript. With more syntax warts and inconsistent naming. But since it's conceptually good and mainly ugly in practice, it's still probably my favorite language so far. React and Redux go to great lengths to encapsulate mutable state, but end up with such ugly syntax or syntactic sugar that little is gained over sticking to HTML with classic Javascript handlers sprinkled in. In which case it's better to go with htmx and stay as declarative as possible.
I tried to keep this brief, but I could blabber on about it forever hah!
Its a regular chip with a few extra units bolted on the end, thats been copy and pasted in a grid.
We've seen the same stuff from folks like Meta, even Tesla.