Abstract
This paper presents a 3-D integrated circuit (3-D IC) for heterogeneous domain-specific streaming architec- tures. In such architectures, an array of fine-grained accelerators is provided for executing kernels, and applications are mapped via configuration of the accelerators into a desired computa- tion pipeline. The two-layer 3-D IC addresses architectures for different application domains, through a generic routing-and- memory (RM) layer and a separate compute-accelerator (CA) layer, which could ultimately be selected at assembly time for different application domains. The RM layer provides a configurable routing network, as well as memory for pipeline buffering and computation scratch pad. The routing network is based on a 2-D mesh with low-swing signaling. The mem- ory is organized as 32 fine-grained (1-kB) SRAM tiles for increased interface parallelism, reduced access energy, and mod- ularity, to interface with different accelerators in the CA layer. Memory-driver and sensing circuits are reused by the low- swing routing network, both for repeaters and to directly load pipeline data into accelerator input buffers. For the prototype, the CA layer is implemented as an array of multiplexers, providing off-chip interfacing t o any memory title, thereby enabling different accelerators to be emulated by an off-chip field-programmable gate array (FPGA). The 3-D interconnection is achieved by 8- µm-pitch face-to-face (F2F) vias and wafer-level assembly. For the 2.47×3.38 mm 2 two-lay