HPC on Convolutional Neural Network
Convolutional neural network(CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Especially, various accelerators for deep CNN have been proposed based on FPGA platform because it has advantages of high performance, reconfigurability, etc. Our research involves FPGA-based acceleration of both forward recognition process and backward parameter training process.
Accelerating Convolutional Neural Network on FPGA
According to the characteristics of the convolution neural network(CNN), a FPGA-based acceleration program which uses deep-pipeline architecture was proposed for the MNIST data set. In this program, theoretically 28 * 28 clock cycles can finish the whole calculation and get the output of the CNN. For the propagation stage of the training process, and in the same network structure and the same data set, this FPGA program with 50MHz frequency can achieve nearly five times speedup compared to GPU version(Caffe), achieve eight times speedup compared to 12 CPU cores. While the FPGA program just costs 26.7% power which GPU version costs.
Training CNN on FPGA-based Platform
Training CNN requires a huge amount of computational resources with high consumption of time and energy. We are working on an integrated CNN training framework based on a hybrid CPU/FPGA platform. The framework contains FPGA-oriented modules for computation and employs CPU for task scheduling. The modules are designed to be customizable with adjustable configurations, such as network layer size and data precision, and are applicable to large-scale networks and datasets utilizing the available hardware resources on a modern FPGA card. A brief overview of the architecture is showed in the figure below.
Wenlai Zhao, 5th year Ph.D. student, Tsinghua University
Jiahe Liu, 2st year Master student, Tsinghua University
Zihong Lv, Master, Tsinghua University