Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

<many years ago> when Intel acquired Altera, and announced Xeon CPUs with on-chip FPGAs, I was optimistic that eventually they would add FPGAs to more low-end desktop CPUs (or at least Xeons in the sub-$1000 zone). But it never materialized. I'm slightly optimistic this time around too, but I suspect that the fact that Intel didn't do it hints at some fundamental difficulty.


Nokia designed their ReefShark 5G SoC chipset with significant FPGA component and used Intel as their supplier. Intel couldn't deliver what they promised. It was complete disaster.

They had to redesign ReefShark and cancel dividends. It was a huge setback.


So it's Nokia that Charlie @ Semiaccurate was talking about back in 2018 saying that Intel crushed a USD 20bn market cap company

https://semiaccurate.com/2018/07/02/intel-custom-foundrys-10...


This is utter bullshit. Nokia f*cked up because they over-engineered their FPGA solution for 5G. They took largest FPGA in the market and couldn't squeeze their design in it.

It was not Nokia SoC just plain Stratix10. They moved to own SoC after that glorious project.


I wonder how much of the delay in FPGA tech adoption is due to the utterly hilarious disaster that are the toolchains. They look like huge brittle proprietary monstrosities, incompatible with modern development methodologies.


I did FPGA development for a few years a little over a decade ago. I recently came back to it for a project after doing software and just wow--the tooling is still absolutely awful. Possibly worse than before. Vivado in particular seems almost designed to foil version control systems. Which files actually contain user input and are necessary to rebuild a project? Why would you want to keep source and configuration files separate from derived objects? Entire swaths of documentation and examples become immediately obsolete with each new tool version. Not to mention infuriating bugs at every turn.


Version control aside, Vivado is very good at what its intended to do, take RTL and synthesize, place and route, STA, and simulate it all in one tool. With plenty of higher level abstractions like IPI, etc. It's really good at visualisation and cross probing. I use it to check my ASIC RTL designs as it's better than the (way) more expensive ASIC tools. All sources needed to rebild a project are refered to in the .xpr project file. Project rebuilds are completely scriptable, it's really not thst opaque.


And they are based on SystemVerilog and TCL, two of the worst programming languages in serious use.

Those toolchain disasters are not quite as hilarious when you have to use them daily....


Oy. I'm a Python guy, but Tcl is NOT that bad. Do not blame the horrible software engineering at Altera and Xilinx on Tcl. Those companies make more than enough money that they could sit down with Tcl and Tk, spend some time on the code, and have a quite decent tool. Instead, they keep their bitstream completely closed to lock out competitors and saddle the world with shitty tools.

I'm really surprised that Lattice hasn't tried to go around Xilinx and Altera by doing exactly that. You would think that an open bitstream format and a couple million dollars thrown at academic researchers (Lattice makes about $200 million per quarter in gross profit) would produce some real progress, but I digress ...

SystemVerilog, on the other hand, was specifically created because Verilog and SystemC got loose to the end users and the EDA companies were not going to make that mistake again. So, yeah, SystemVerilog is pretty bad.


Open source tooling would not materalize if bitstream formats were opened, at least not competitive ones. Why? There are already open source versions for synthesis and PnR , and while functional they are very far off the 'terrible' EDA tools everyone rags on Xilinx for. The reality is SystemVerilog is a huge language, and already an open standard yet no open source project supports it fully, so I don't believe for a second if bitstreams were opened we'd see a load of top class tooling appear for synthesis and PnR. The reason is if it has not happened for the first (and arguably easiest) step in the chain i.e. System Verilog, they why would it happen for the others?


> There are already open source versions for synthesis and PnR , and while functional they are very far off the 'terrible' EDA tools everyone rags on Xilinx for.

These "tools" have no target so no incentive to improve. To use them you have to basically push their results back into a Cadence/Synopsys/Mentor toolchain anyway, so you might as well stick to the supported toolchain.

> The reality is SystemVerilog is a huge language, and already an open standard yet no open source project supports it fully

Most commercial systems don't support it fully. And its not clear that SystemVerilog is that superior to VHDL. And, for quite a while, SystemVerilog wasn't open and had some fairly obnoxious patents surrounding it. I don't know when/if that has changed as I have been out of semiconductors for about 20 years now.

Icarus Verilog has been slowly supporting features from SystemVerilog but doesn't have a lot of manpower.

In general, the consolidation of the semiconductor industry and EDA has hurt open-source EDA improvements. There's not very much money coming from companies to fund EDA research. EDA startups can't really get venture funding since VC's all want to fund the next pile of viral social trashware. And anyone with good software skills left the semiconductor industry eons ago because the pay differential is ridiculous.


The commercial FPGA tools have tremendous technological advantages, but the free part is inherently what many FOSS users value, not the other stuff. You're trying to talk about technical QoR between tools but the difference for anyone who really cares is ideological, not technical.

> The reason is if it has not happened for the first (and arguably easiest) step in the chain i.e. System Verilog, they why would it happen for the others?

Ehhhhh, I don't think I buy this at all. There are dozens of alt-HDLs out there, many of which are quite powerful, designed by solo users. People had working, simple-but-practical PnR for real devices in a ~7k C++ LOC codebase written by an individual (arachne-pnr) and many individuals have independently reverse engineered small-ish scale device families for packing utilities. nextpnr was written by a very small group (solo?) in a year or something. I don't think you could fit an equivalent parser for SV2017 in ~7k LOC, much less elaboration, type checking, a netlist database, to all go along with it. SystemVerilog might actually be the most difficult part of the whole equation because it simply has so much surface area. PnR tools are limited by their target: only targeting small iCE40 devices? Your PNR algorithms don't need to be cutting edge. Targeting SV2017? Your job is hard no matter what device you synthesize for. And I can't think of even a single commercial tool I know from any vendor that supports all of it, up-to-date with SV2017.

All that said, I use SystemVerilog as my "normal" RTL when using commercial tools for stitching together IP, wiring up top modules, etc.


My point a out SV was that the two major open source simulation tools (Icarus and Verilator) both only support a subset of SV, and not SV 2017, but a lot of SV 2009 is still not supported. Vivado has a free (not open) SV simulator that supports much more of the language. I agree not all of SV is needed for PnR, but what I'm saying is if we don't have the gcc or clang version of SV for simulation yet (vs MSVC or ICC), then what makes you think we'd get a near commercial grade synth / PnR tool? If Xilinx opened up their bitstream format, academics would rejoice, but it would not suddenly spur on a huge improvement in open source PnR tooling. In terms of improving the usability of what is there, given vivado is scriptable, if you want to make a better open one (like an IDE) you can, just call synth_design, etc in batch. This was what Heir Design were doing, and what turned into Vivado after they were acquired by Xilinx. So my point is lots of open source tooling could exist without opening the bitstream format, so given it largely does not, I am of the opinion opening the bitstream format would not change much.


> but the free part is inherently what many FOSS users value

The free part is valuable not in that it's cheap, but in that it saves you from having to deal with licensing.

DevOps pioneers hailed from the likes of Google, Amazon and Facebook, who are not exactly short on cash, but you simply couldn't do what they did if you had been nickeled and dimed at every VM and container.


I have not benchmarked the open source PnR tools, but I expect they are orders of magnitude worse qor than what a commercial one can do. I don't know a LOC comparison between SV and PnR but I'd say both are huge undertakings at commercial features set.


> already an open standard yet no open source project supports it fully

Bitstreams are closed. There's little to no point in doing an open source compiler if the target is not just proprietary, but deliberately opaque.

Overall your comment strikes me as what a proprietary compiler advocate would say in the 90s. "GCC? Lol"

Since then, Microsoft had to include Linux in Windows just because they absolutely needed Docker. DevOps was invented based on free/open source, it just couldn't be done proprietary style by a company as large as Microsoft.


I disagree. By far the largest part of SystemVerilog deals with verification, both simulation and formal property proving. These parts have nothing to do with the bitstream formats and the tooling in that area is quite as lacking as the synthesis and PnR tools.

The limitation here is writing the SystemVerilog parser and compiler.


What's the incentive for free software hackers and startups to even begin to work on this if the rest of the stack is not just proprietary, but held by actively hostile entities?


There are other places to start working on the stack which is not as actively hostile as place and route, e.g. simulation.

As for the incentive I'm fairly pessimistic. There is definitely no money to be made for a start-up in this space, it is way too conservative. Maybe the hobbyist intellectual challenge of working on some hard problems like constraint solving or formal property proving? There is a massive task of writing a SystemVerilog parser before you get there though and the SAT solving and property proving problems are present elsewhere with lowers barriers to entry.


Challenges can be stimulating, but there are diminishing returns. It's not like say, lockpicking or DRM-cracking, in that the subject matter is super hard to begin with, even without the proprietary sabotage.

Having said that, there has been some promising F/OSS work on the small Lattice devices. It allows for a decent, modern workflow, and it's possible because the devices are approachable, but also because Lattice hasn't been hostile. Why they haven't been more supportive is a mystery to me however.


TCL itself is not that bad for the purpose IMO; it's more the stuff around it, the proprietary binary formats, the gooey crap, and the non-open nature thereof.


Tcl is the de facto EDA tool scripting language. It's standard in the HW design world - of course it does not stop there being a second alternative scripting language, but not having TCL would alienate much of the HW design community, so must be there. As for the HDL, vivado supports SystemVerilog, VHDL, and C via HLS. I happen to like SystemVerilog, what about it makes it terrible?


TCL is the only language I have ever worked with where a comment would affect the next line. Might have been an interpreter issue, but it was enough for me never to want to touch it again.

SystemVerilog is a good examle of an organically grown language with no 'benevolent dictator'. A few pet peeves:

* Why is the simulation delta cycle split into 17 regions? Exactly when does the Pre-Re-NBA region happen and what assignments take place there?

* Why can't a function return a dynamic/associative array or a queue? This is clearly possible, since the array find functions return a queue, but it's not possible to define a user function with this return type.

* It has way too much cruft. E.g. what problem does the forkjoin keyword solve? Who thought that was necessary and why? Not a fork-join block, the forkjoin keyword.

* Why can't you have a modport inside a modport? This would be great for e.g. register interfaces, but modports are not composable.

* What is the difference between a const variable and a localparam and why does the language need both constructs?

* Is a covergroup a class or what? It behaves very much like it is, it has a constructor, some class local information and at least one class local function (the sample() function), but you can't extend it.

* Why are begin-end used for scope delimitation everywhere except in constraints where curly brackets are used? I know it was a Cadence donation, but why wasn't the syntax changed before it was merged? Backwards compatibility can only justify so much...

//rant off

edit: formatting


You're right about tcl, a comment can mess stuff up as the comment is a command that says do nothing. It's a terrible language, and that may be it's worst flaw, but it's still in every EDA tool. It's kind of like how C is still around despite its foot shooting ability costing billions every year due to security and bugs due to buffer overflows, etc. If an EDA tool wanted to break the mold and use say python for scripting they would still likely need to offer a tcl option. It's very ingrained in industry.

As for SV - a lot of your gripes are Verilog issues, and SV has tried to fix some of them. I agree the blocking / nonblocking is a mess but most folks just learn the rules to avoid issues, but delta cycles can be a pain. The syntax limitations/quirks you point out are intersting, though not enough to say the language is terrible, it's extremely powerful with very good composability of types, constrained random is very powerful, the coverage is extensive, assertions again are very powerful. In a way its line a few seperate languages bolted together so sure there is some duplication, but it works surprisingly well in the whole.


I think pricing is also an issue. Anyone with 5 dollars in their pocket can buy an arduino clone and go to town. And many people do as can be seen by the huge hobbyist scene. You want to try FPGA development and do anything that is not blinking a LED? Good luck shelling out hundreds to thousands of dollars for the shittiest software known to this planet.


A Max10 T-Core board from Terasic is $55 academic and tools are free for the Max10 class.

You only start paying for FPGA tools when you need the really big FPGAs.

And, I'll go out on a limb, but, at this point, I think Arduino causes more harm to beginning embedded developers than good. Yeah, the ecosystem is wonderful if you aren't a developer.

However, Arduino is now weird compared to mainstream embedded development. Most things have converged to 32-bit instead of 8-bit. Arm Cortex-M is now mainstream so your architectural understanding is useless. 5V causes a lot of grief given that everybody else in the world is at 3V/3.3V.

A developer basically has to unlearn a bunch of things to move up from an Arduino. I still recommend Arduino to non-developers or somebody just trying to throw together a project, but I no longer recommend them to someone actually trying to learn embedded development.


Just to clarify, there are many Cortex-M* based Arduino or Arduino compatible boards. There's official Arduino-SAMD BSP support, though they do lack the depth of features, like Timers and such. Though it seems 8-bit procs are still common for super cheap MCU's.


The issue is not whether the end-user has to pay, the issue is that this kills incentives for free software tools. gcc and BSD were initially developed on machines costing hundreds of thousands of dollar, that didn't stop them.


Arduino gets the job done for most hobbyst, it's also easy to move forward from arduino to esp32 which is 32bit and freertos based.


What does usb 3 gigabit ethernet or pcie ip cost? Is it for free using intel?


I'm optimistic... not so much because of the merits of the acquisition but moreso because of AMD's history with strategic actions. ATI kept them afloat through a CPU performance drought, and divesting globalfoundries secured necessary liquidity. These two alone essentially saved AMD, so I've got faith in leadership being able to make the appropriate strategic maneuvers.

But maybe I'm being overly optimistic. (Probably because—disclosure—I'm long AMD. Been long for years.)


I'm hoping/expecting a chip that goes into the Epyc/SP3 socket and has the memory & PCIe & socket crossconnect as hard IP but the CPU cores replaced with programmable logic. If you have a use case for FPGAs, it's more likely you want it in a concentrated form like this... not on low-end or desktop systems :/

If I remember correctly, there was something similar back in the early HyperTransport days...


Are you envisioning retaining at least a few cores? It seems like you'd probably still want an OS running on native silicon.


I assume they are thinking about a design for multi-socket systems.


Yeah I think it's both more effective and cheaper to have dual/quad socket systems with 1 "normal" CPU and the rest filled with FPGAs without CPU cores, just to max out on the raw crunching ability. The PCIe block on the FPGA chips could be flexible enough to (re-?)wire directly into the programmable logic, maybe even reconfigurable to other protocols (e.g. 100GE). Also in "normal" NUMA fashion each FPGA would have the memory channels associated with that socket (presumably through the interconnect as if it were a CPU, so the CPU can access it too.)

I'm just looking at this from a logical chain of "who needs FPGAs in their computers?" => "cases with loooots of specific data crunching" => "want a controlling/driving CPU for the complicated parts, but then just concentrate as much FPGA in as possible." => Multi-socket with 1 CPU & rest FPGAs.

(There currently is no commodity Quad-socket SP3 mainboard, not sure if this is a design limitation or just no one made one yet? I'd still say the approach works great with only 2 sockets.)


I wouldn't expect to see anything like this on SP3 anyway, since it would take some time to do the work and by then the current generation would likely be whatever they replace SP3 with in order to support DDR5.


Well, you want something more low-end for developers' and hobbyists' machines, I would guess.


As much as I agree with you and want one for myself too, I doubt that this market segment is interesting to AMD at all. The kinds of workloads that warrant going FPGA are the kind of workloads where you just give your devs a bunch of high-priced development systems. Those would likely be close to identical to the production boxes, just with more debug pieces plugged in.


This opinion is unlikely to be popular, and it's been decades since I was a full participant in the hardware business, but...I just have never seen the use case for FPGAs beyond niche prototyping / small run applications, which by definition make no money. I suppose there are also scenarios where you want to keep your design secret from the fab and/or change it every week, but those seem very niche too (NSA, GCHQ, ..?).


Couple things--

1) You underestimate how critical prototyping has become, again likely since you say it's been a couple decades. Time to market has become more important, and verification has become harder as CPUs have gotten even more complex. FPGAs enable cosimulation and emulation, leading to faster iteration of both design and verification efforts and thus better TTM.

FPGAs are so important in the hardware development process that I would even say you're not a serious hardware company if you don't have any FPGA frameworks to design silicon.

2) As others have mentioned, FPGAs are also critical for low-latency workloads that require constant tweaks-- high frequency trading (ugh...) comes to mind. The need for "constant tweaks" could also be satisfied with just "normal" software, but that has higher latency as opposed to an FPGA, and FPGAs can get some crazy performance if you're willing to pay the price (south of 7 figures).

Overall sure, usage of FPGAs might be niche compared to, idk, Javascript; but it's commonplace/practically essential in hardware.


Whenever discussion of FPGA comes up on HN, someone inevitably points to low latency workflows but nobody ever mentions video capture and play-out boards using FPGAs. Companies like Blackmagic, Elgato, Matrox, etc.


Hardware like this often uses FPGAs because there’s a need for highly parallel processing that is difficult or even impossible to do on a off the shelf CPU, but the volumes are too low to justify a custom ASIC. Being able to fix bugs or add features after shipping is a big bonus too.


It is very likely that the packets of this comment traveled through several FPGAs to get from your computer to my screen. Yes, they are definitely more niche than CPUs. But niche products have really high margins and people willing to pay for them.

FPGAs are already incredibly popular. They're just mostly in things you are unlikely to personally own or know about. You're going to find at minimum one, but probably more FPGAs in things like big routers and other telecom equipment, e.g. cell towers, firewalls, load balancers, enterprise wifi controllers, video conferencing hardware, test equipment like oscilloscopes, sensor buoys, scientific instruments, MRI machines, LIDARs, high end radio equipment, or even just glue logic tying together other components, like in the iphone.


Yes, in addition to that FPGA also appears under the hood of automobiles and electric vehicles.


I am not sure whether you are serious or trolling but I will bite ;-)

FPGA are being used in many type of applications where real-time is necessary and non-recurrent engineering (NRE) cost need to be minimized, for example here [1].

One classic example is that if you poke under the hood of any signal generator like AWGs, you will probably find an FPGA inside. As you probably aware since you in hardware business, AWGs are probably one of most common equipment in any electronic and electrical labs or companies.

[1]https://www.electronicdesign.com/technologies/fpgas/article/...


For those who are not familiar, AWG stands for Arbitrary Waveform Generator.


Would that be used as part of a synthesizer?


> I just have never seen the use case for FPGAs beyond niche prototyping / small run applications, which by definition make no money.

You are precisely correct. FPGAs are useful when your volume doesn't reach volumes where an ASIC would get amortized.

Networking companies (Cisco, Juniper, etc.) are classically big consumers of FPGAs.

Tektronix seems to make quite a bit of money and there is at least one FPGA in practically every test instrument they make. This holds true for practically all test instrument manufacturers.

I know a LOT of industrial automation and testing companies that generally have FPGAs in their systems. Both for latency and for legacy support (Yeah, GPIB still exists ...).

Yes, they aren't "Arm in a cell phone" type volumes, but that doesn't mean they aren't quite profitable if you can aggregate them.


Easily changing the design and being the cheaper option to ASICs for small productions are the two main uses for FPGAs. You may be designing a box that can be configured to do different things so you may want to support multiple FPGA images to switch back and forth depending on the mission. You may just want to be able to easily upgrade firmware for a complex design in the future. For Space DSP applications, the FPGA is king and will probably be for a long time simply due to the ability to cram a lot of functionality into a small space (DSP, microcontroller, combinational logic circuits, and massive I/O banks all in one chip)


For low volume (sub 100k units?) they're often the only good way to do configurable* SERDES in any environment that is latency sensitive.

Configurable as in one SKU is in several products, but not necessarily reconfigurable by the end user.


Not long ago there was an FPGA inside the iPhone (an ice40). Hardly niche.


Really? What function did it serve the iPhone?


Likely just simple glue logic. Things like converting one protocol into another, doing some multiplexing or some simple pre-processing or filtering on some sensor data. They're incredibly tiny (2x2mm) and use little power, so they pop up in designs pretty regularly.


I wonder if they are reprogrammable, so if what's running on these could ever be updated.


They never ended up shipping the high end ones either.


It's the usual "fundamental difficulty" with FPGAs -- CPUs and GPUs are faster and more power efficient for compute-intensive tasks. An algorithm on FPGA needs to overcome the 20x worse architectural efficiency just to break even with a CPU or GPU.

The big benefit of having FPGA closely attached to CPU is that you can access the memory and internal buses quickly. Transferring stuff over PCIe hurts a lot. So you could make an argument for jobs using small work units requiring fast turnaround; CUDA kernels take milliseconds to launch.

I worked with some of the early Xeon+FPGA parts and there just wasn't that much we could do with them. There wasn't enough fabric to build anything meaningful and we had an abundance of CPU cores, so the best we could do was specialized I/O accelerators.


I think the more relevant comparison here would be ASICs. Softcores on FPGAs are indeed terrible but if you're implementing some algorithm directly at the gate level for cryptography or signal processing or whatever then being able to arrange inputs outputs into dataflows is a big win with no roundrips to general purpose registers or bypass networks. Not having to fetch instructions and being limited in paralellism is also a big win. And generally if you're doing something like mining bitcoin you should expect an FGPA to perform somewhere between an ASIC and a GPU.

The problem is that if a task is common then someone is just going to make an ASIC to do it. And if its uncommon then the terrible FPGA software ecosystem and low prevalence of general purpose FPGAs in the wild mean that people will just do it on a CPU or GPU.


> if you're implementing some algorithm directly at the gate level for cryptography or signal processing or whatever then being able to arrange inputs outputs into dataflows is a big win with no roundrips to general purpose registers or bypass networks

This is true, but keep in mind that that sort of algorithm runs insanely well on any CPU or GPU because they, too, do not want to touch main memory. You would be blown away by how much work a CPU can do if you can keep the working set within L1 cache.

Re. ASICs, it's a continuum:

- "flexible, low performance, cheap in small quantities" (CPUs)

- "reasonably flexible, better performance, cheap-ish in small quantities" (GPUs)

- "inflexible, best performance, expensive in small quantities" (ASICs)

FPGAs fit somewhere between GPUs and ASICs -- poor flexibility, maybe great performance, moderate small-quantity price.

If your problem is too big for GPUs, as you say, sometimes it's easiest to jump straight to an ASIC. But it's such a narrow window in the HPC landscape. The vast majority of customers, even with large problems, are just buying a lot of GPUs. They're using off-the-shelf frameworks even though a custom CUDA kernel would give them 10x performance and 10% cost. The cost to go to an FPGA is too great and the performance gain simply isn't there.


Im skeptical as well. The primary reason IMO is the software. How do you easily reconfigure your FPGA to efficiently run whatever computationally intensive and/or specialized algorithm you have?


It is doable. I've seen it during my Computer Engineering courses 14 years ago.

Basically you analyze the code for candidates, select a candidate, upload your custom hardware design, run your operation on the hardware, and repeat.

The difficult part is that uploading your hardware to FPGA is in the order of tenths of seconds, which is ages when compared to the nano and micro seconds your CPU works. So your specific operation must be worthwhile to upload.

A bit of FPGA on your CPU makes it more flexible, for example your could set a profile such as 'crypto' or 'video' to add some specific hardware acceleration to you general purpose CPU.

Imagine your CPU being able to switch your embedded GPU into another CPU core.


Codecs are a great example.

Let's say the current zen 2 had an FPGA onboard. AMD could sell you an upgraded design with AV1 support for a few dollars. Most people aren't going to buy a new CPU on the basis of a video decoder, but they'll buy an upgrade to the chip that auto "installs" itself. That's a sale AMD otherwise wouldn't have made.


Except the new codec won't fit into the FPGA they put on that chip that's in the field.


The codec is gonna get nowhere near to filling a "CPU-class" FPGA, so if anything you get fewer parallel instances of it.


Also, for the way most modern CPUs are used: how do you task switch? If the hardware is large enough, you can deploy multiple configurations at a time, but does software support that? Is is possible to have relocatable configurations?

In theory, you could even page out code, but I guess the speed of that will be slow. Also, paging in probably would be challenging because the logical units aren’t uniform (if only because not all of them will be connected to external wires)


This can be used with a client-server model, that is if there are enough free cells and I/O available on FPGA it could let it install the configuration and then any application could communicate with it concurrently, maybe with some basic auth.


But from what I understand of FPGAs, fragmentation would be a serious issue. You may have the free cells and I/O you need to implement some circuit, but if they’re dispersed over your FPGA or even connected, but in the wrong shape for the circuit you’re building, that’s useless.

An enormous crossbar could solve that, but I would think that would be way too costly, if practically possible at all.


You can reconfigure just part of the FPGA, it isn't used all that often though.


I would see it being used more like a GPU than a CPU.


Even GPUs multitask all the time, even though it's less obvious. Cooperative multitasking in this context means setting up and executing different shaders/kernels. The overhead involved in this is quite manageable.

Repurposing FPGAs to different tasks means loading a new bitstream into the device every time. So it is much more efficient to grant exclusive access to each user of the device for long stretches od time. The proper pattern for that is more like a job queue.


An actual GPU or CPU will always run circles around an FPGA CPU or FPGA GPU.

Where FPGAs win are new architectures, like Systolic engines. Entirely different computer designs from the ground up.


I believe there is some amount of support in OpenCL for FPGAs. If only we could get companies to property support OpenCL, we'd have a nice software interface to pretty much any kind of compute resource on a machine.


My armchair amateur brain immediately thought about something CUDA-like.


FPGA code takes hours to compile, yet product/model specific


You're not wrong but I expect they'd make it so that the various models would be similar enough (at least within a given CPU generation) so that you could use mostly precompiled artifacts instead of rerouting everything from scratch.

I've always been pretty skeptical of their approach though, in order to be usable they'd need excellent tooling to support the feature, and if there's one thing that existing FPGA software isn't it's "excellent".

Getting FPGAs to perform well is often an art more than a science ("hey guys, let's try a different seed to see if we get better timings") so the idea that non-hardware people would start to routinely generate FPGA bitstreams for their projects is so implausible that it's almost comical to me.

Maybe one day we'll have a GCC/LLVM for FPGAs and it'll be a different story.


Beyond the GCC/LLVM, you also really need a standard library. Nobody is talking about that. Today, if you want a std::map on an FPGA, you have to either pay $100k or build it yourself. That's untenable.


You would use precompiled modules or compositions of these modules (pipeline or parallel).

This can be a relatively fast operation. Seconds or less depending on complexity.


Apparently after Altera acquisition they sought "synergies" in all the different divisions. My friend was an intern who was tasked with porting some of the network protocol stack to SystemVerilog. Apparently it did work and SystemVerilog was the right HDL to use because of support for structs that can map to packet headers. I'm not sure it's being used in production.

It'd be interesting to see how AMD will execute and integrate this acquisition, considering they are less of a madhouse company than Intel.


It absolutely seems like there are some incredible opportunities in the high end. But as far as I know, FPGAs are quite area hungry which makes them inherently expensive. It's hard to think you'd find FPGAs of meaningful size included in $60 desktop CPUs, unless the harvesting opportunity is significant.


On-package, not on-chip





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: