← 返回 JSSC 论文列表JSSC 2013第7期Digital Circuits0.13μm CMOS
An 86 mW 98GOPS ANN-Searching Processor for Full-HD 30 fps Video Object Recognition With Zeroless Locality-Sensitive Hashing Gyeonghoon Kim, Jinwook Oh, Seungjin Lee, and Hoi-Jun Y oo
提出一种用于全高清30帧视频实时物体识别的98GOPS ANN搜索处理器
62,720 vectors/s吞吐量,1140 GOPS/W能效比
近似最近邻搜索物体识别实时处理全高清视频低功耗设计
▸创新点1:帧间缓存架构硬件优化(系统创新) - 提出了一种帧间缓存架构,通过硬件优化减少外部内存带宽需求,利用帧级数据相关性提升处理效率,支持全高清30fps实时视频处理。
▸创新点2:零敏感哈希算法软件优化(方法创新) - 开发了zeroless-LSH算法,通过向量级数据优化减少外部内存访问,显著降低数据事务量,提升搜索效率,实现1140 GOPS/W的能效比。
▸创新点3:四路组关联片上缓存设计(电路创新) - 设计了四路组关联的专用片上缓存架构,优化数据存取路径,提升缓存命中率,支持62,720向量/秒的高吞吐量处理。
▸创新点4:高能效比与吞吐量优化(系统创新) - 综合硬件与软件优化,实现了1.45倍吞吐量提升和1.37倍能效比提升,优于现有技术,满足实时全高清视频对象识别需求。
Abstract
Approximate nearest neighbor (ANN) searching is an essential task in object recognition. The ANN-searching stage, however, is the main bottleneck in the object recognition process due to increasing database siz e and massive dimensions of key- point descriptors. In this paper, a high throughput ANN-searching processor is proposed for high-resolution (full-HD) and real-time (30 fps) video object recognition. The proposed ANN-searching processor adopts an interfram e cache architecture as a hard- ware-oriented approach and a zerol ess locality-sensitive-hashing (zeroless-LSH) algorithm as a so ftware-oriented approach to re- duce the external memory bandwidth required in nearest neighbor searching. A four-way set associative on-chip cache has a dedi- cated architecture to exploit data correlation at the frame-level. Zeroless-LSH minimizes data tra nsactions from external memory at the vector-level. The proposed ANN-searching processor is fabricated as part of an object recognition SoC using a 0.13 6 metal CMOS technology. It achieves 62 720 vectors/s throughput and 1140 GOPS/W power ef ficiency, which are 1.45 and 1.37 times higher than the state-of-the-art, respectively, enabling real-time object recognition for full -HD 30 fps video streams.