Amr Suleiman,  Zhengdong Zhang,  Vivienne Sze

DOI: 10.1109/JSSC.2017.2648820


This paper presents a programmable, energy-efficient, and real-time object detection hardware accelerator for low power and high throughput applications using deformable parts models, with 2x higher detection accuracy than traditional rigid body models. Three methods are used to address the high computational complexity of eight deformable parts detection: classification pruning for 33x fewer part classification, vector quantization for 15x memory size reduction, and feature basis projection for 2x reduction in the cost of each classification. The chip was fabricated in a 65 nm CMOS technology, and can process full high definition 1920 × 1080 videos at 60 frames/s without any OFF-chip storage. The chip has two programmable classification engines (CEs) for multiobject detection. At 30 frames/s, the chip consumes only 58.6 mW (0.94 nJ/pixel, 1168 GOPS/W). At a higher throughput of 60 frames/s, the CEs can be time multiplexed to detect even more than two object classes. This proposed accelerator enables object detection to be as energy-efficient as video compression, which is found in most cameras today.