Are you one of the early pioneers in buying electric vehicles? In Q3 2022 in the US, 6% of new car sales were all-electric models. Despite all the hype and hefty purchase subsidies to support battery cars, today only 1% of the cumulative number of vehicles in operation in the U.S. are purely plug-in electric. It’s not too much. One reason electric vehicle sales have not completely outstripped conventional vehicle sales is the lack of widely available fast-charging infrastructure. Consumer research highlights the fear of running low on battery and not being near fast charging stations as the main barriers to adoption. Auto market analysts call this concern “range anxiety.” Oddly enough, the semiconductor market has its own set of concerns.
Designers of advanced semiconductors for nearly every end-market are adopting and deploying machine learning (ML) processing capabilities in new silicon designs. System-on-chip (SoC) designers can deploy a combination of programmable cores (CPUs, GPUs, DSPs) and dedicated machine learning inference accelerators (NPUs) to run highly efficient solutions that run modern, state-of-the-art ML is trying to provide inference model.
However, the world of machine learning is changing rapidly. Innovative data scientists continue to discover and invent new techniques to improve the state of the art (SOTA). The benchmark ML networks currently used in 2023 to select intellectual property (IP) building blocks for new SoC designs did not exist 3-4 years ago. Silicon as it is currently designed will be in production in 2025 and 2026, at which point the SOTA model may not look like today’s state-of-the-art network. ML model changes occur in two ways. Different model topologies that rearrange known operators into deeper and more complex networks, and the creation of new basic ML operators. The former (rearrangement of building blocks) is not a difficult idea for SoC designers. But the latter (the new operator) poses a great deal of concern to chip designers. What if your current ML accelerator of choice in 2023 can’t support the new operators invented in 2026? It coined the terminology to explain the concern that current options may haunt chip designers in the future.
A typical architectural approach for today’s silicon solutions combines an NPU “accelerator” for machine learning with a fully programmable but much lower performing CPU, GPU, or DSP. Accelerators are typically wired to perform the most common ML operators (convolution, activation, pooling) as efficiently as possible. Some accelerators lack the ability to add new functionality as silicon becomes available. Others are in the form of microcoded command streams handwritten by IP vendors with limited flexibility. In both cases, the NPU vendor promises her SoC designer that, in the worst case, a companion CPU or DSP can run the newly emerging operator in what the IP vendor calls fallback mode. But a programmable CPU or DSP can be orders of magnitude slower than an NPU. Often two orders of magnitude slower. (After all, if CPUs were nearly as fast as NPUs, why do we need NPUs in the first place?) This is the source of my anxiety! will degrade overall system performance.
Running a new ML operator on a slow DSP or CPU is like plugging an extension cord into a 110V wall outlet to charge an EV. All an EV needs is an 800-volt fast charger that fills up the battery in 20 minutes. Not a low voltage, low amperage outlet that takes 18 hours to charge your car.
The answer to the EV range’s fears is ubiquitous and readily available fast chargers. SoC parallels: Operator anxiety can be eliminated by off-the-shelf processors that can run any operator with the performance efficiency of a dedicated NPU.
There is a cure for operator anxiety!
Quadric’s Chimera GPNPU – available in 1 TOPS, 4 TOPS, and 16 TOPS variants – is the anxiety-relieving solution you’ve been looking for. The Chimera GPNPU provides the matrix-optimized performance expected of an ML-optimized computing engine, while being fully programmable in C++ by software developers. New ML operators can be created and run as fast as the “native” operators created by Quadric engineers. With the Chimera core, whatever new forms of operators and graphs the future brings, there are no fallbacks or operator fears, only rapid execution.
Steve Roddy
(all posts)
Steve Roddy is the Chief Marketing Officer of Quadric.io. Previously, he was Vice President of Arm’s Machine Learning Group, and before that he was Vice President of the IP Licensing business at Tensilica (acquired by Cadence) and Amphion Semiconductor. He has also held product management roles at Synopsys, LSI Logic, and He AMCC.