The hardware is interesting, but it also looks like they've put a lot of effort into programming it. You can approach it from C, or they have a way for you to specify a dataflow and get it mapped onto the hardware. They also appear to have lots of performance analysis tools in their developer kit (which has no price, so I'll just assume 5 figures or vapor).
It isn't aimed at GPU work and it is strange enough not to be a general purpose computer. The obvious hacker markets are password cracking and bitcoin mining, the coins per joule might be a game changer. I'm sure someone on Wall Street can use them to harvest fractional pennies at astounding speed. Low power, high throughput, and big compute sounds about right for an NSA network sniffing appliance. The application will have to be big enough to get someone over the learning curve to justify the strangeness. "Hello World!" won't be hard, but optimizing will be. Something on the Google indexing or Siri scale would be an obvious market if the computation per watt justifies it.
• 16 clusters of 16 processors all tied together with a fast on chip network.
• 32MB of local memory for each cluster.
• Two DDR3 interfaces
• Two 40Gbps ethernet or eight 10Gbps (though I only see two on the diagram).
This little chip could be a significant boon to signal and image processing. A number of those applications, particularly surveillance radar, sesimic/hydrographic survey, and sensor fusion[1], have signficant data parallelism. Many have multiple independent data streams that must be processed and integrated in real time. With the proper algorithms, an architecture like this is much more efficient than a general-purpose multi-core or multi-machine system.
One of the key limits to these sensors is the data pipe between the various pieces that run the signal processing. This is especially true when one of the hops is over a relatively slow link, such as with an ROV/UAV. The size and weight of this chip would allow more of the signal processing to be embedded on the platform or in the sensor, thus reducing the volume of data passed over the slow link and allowing more information to be sent.
It does, of course, require a different programming paradigm to use effectively. Special applications like I have mentioned are worth the extra effort and reduced portability and so will see value in this device. No doubt some other applications mentioned so far (crypto, finance, etc) will also see value in it as well.
[1] I have a strong background in and knowledge of military sensors, and signficant exposure to seismic and hydrographic survey in the oil and gas industry.
1. I don't see a price anywhere in the article. Adepteva plans to sell the 64-core version for $199 and that includes the board.
2. Adepreva plans to open source the hardware and software, does KALRAY?
3. Adepteva have actually built chips. The kickstarter project is to scale up manufacturing to bring down the cost. I think $99 for 16 cores is pretty good.
Still, if the price is right, more is better, so it's worth keeping an eye on.
Has there been any research into running neural networks on DSP architectures? I'd love to try my hand at programming a DSP. Are there any low-cost options for hobbyists?
Keep your eye for TI tech days. They have them in most major cities. Its a series of classes layered with tons of TI advertising. I usually walk out of them learning something.
The reason I bring them up is they usually give out 95% off anything under $500 coupons from their estore. I picked up a nice DSP dev board with one of them not too long ago.
it's a nice science experiment, but parallelization has its limit. there are very few examples of (non mathematical) algorithms that can be parallelized to 256 cores...
It isn't aimed at GPU work and it is strange enough not to be a general purpose computer. The obvious hacker markets are password cracking and bitcoin mining, the coins per joule might be a game changer. I'm sure someone on Wall Street can use them to harvest fractional pennies at astounding speed. Low power, high throughput, and big compute sounds about right for an NSA network sniffing appliance. The application will have to be big enough to get someone over the learning curve to justify the strangeness. "Hello World!" won't be hard, but optimizing will be. Something on the Google indexing or Siri scale would be an obvious market if the computation per watt justifies it.
• 16 clusters of 16 processors all tied together with a fast on chip network.
• 32MB of local memory for each cluster.
• Two DDR3 interfaces
• Two 40Gbps ethernet or eight 10Gbps (though I only see two on the diagram).
• Two eight lane PCI Express interfaces