Hardware evolution: AI & Deep Learning pushing the boundries

Hardware evolution - from CPU to GPU

Current move to blockchain technology and deep learning could shift the focus of the microprocessor industry from general application performance to neural net performance. Originally computer processors were not designed for such complex tasks.

Both the x86 processors were designed for general workloads such as internet browsing, mobile apps, and video streaming are based on integer operations and tend to be sequential in nature. In contrast, deep learning workloads are based on floating point, or decimal, operations and are parallel in nature.

Hence deep learning demands different processor designs, specifically those with high floating point performance. The highest performing processor for floating point operations is the Graphics Processing Unit (GPU).

As shown below, since the mid-2000s GPUs have outstripped Central Processing Units (CPUs) in transistor count, a measure of chip complexity and performance. NVIDIA’s Pascal P100 GPU, for example, incorporates 15 billion transistors, almost double that of Intel’s “Knight’s Landing” Xeon Phi processor.

Screen Shot 2017-07-04 at 12.55.06.png

And because GPUs are designed for floating point intensive graphics workloads, most of its transistors are devoted to floating point operations, while CPUs serve a variety of workloads and can devote only a portion of their transistor budget to floating point operations.

Deep learning is unique enough as a workload to justify a new architecture and to support the daunting cost of ongoing chip development.

Moving forward, deep learning eventually could reside on-chip as part of an acceleration block, similar to the way graphics is handled today.