| PhD Viva

Name of the Speaker: Mr. Balaji Vijayakumar (EE19D202)
Guide: Dr. Janakiraman Viraraghavan
Online meeting link: https://meet.google.com/sub-tztv-voe
Date/Time: 14th November 2025 (Friday), 11.30 AM
Title: Circuit Techniques for Input-Conditioned and Full-Range Quantization in Compute-In-Memory

Abstract :

Artificial intelligence (AI) inference on the edge has led to a demand for energy-efficient and high-throughput computing architectures. Traditional von Neumann systems, which suffer from energy and latency bottlenecks due to data movement between memory and processing units, are being increasingly replaced by alternative architectures such as Compute-In-Memory (CIM). In analog CIM systems, a fundamental operation is the multiply-and-accumulate (MAC) computation, typically executed in parallel across memory columns and subsequently quantized using analog-to-digital converters (ADCs). However, designing high-precision ADCs in CIM macros poses significant challenges, particularly due to stringent area constraints from pitch-matching requirements and energy overheads. This thesis presents the hardware implementation of a prior software idea that extends the precision of a 7-bit on-chip ADC to support up to 10-bit MAC resolution, without incurring the typical design penalty associated with a 10-bit ADC. The proposed method introduces two operational modes: full-range quantization (FRQ) and input-conditioned quantization (ICQ). The FRQ mode enables 2-to-7-bit quantization by operating the 7-bit successive approximation register (SAR) ADC over fewer cycles. On the other hand, for higher precision requirements (>7 bits), the ICQ mode leverages the insight that, at any given time, only a single input vector is presented to the macro. As a result, the effective MAC range, conditioned on that specific input, is significantly smaller than the range spanning all possible input vectors. This reduced range is efficiently mapped to the dynamic range of the 7-bit ADC using a residue amplification (RA) technique, implemented through a subtract-mirror-amplify (SMA) block in each weight column. This enables 7-to-10-bit quantization by modifying the block-level architecture of a conventional CIM macro to implement ICQ. In addition, this thesis presents techniques that co-optimize circuit design and physical layout by integrating storage using SRAM cells and tightly pitch-matching critical circuits, such as current mirror calibration and input storage for digital-to-analog converters (DACs). To ensure robust operation under device mismatch and environmental variation, the design incorporates correction strategies targeting mismatches in the SMA blocks and accounts for process, voltage, and temperature (PVT) shifts at the compute-cell level. For minor environmental perturbations, a real-time correction mechanism is introduced to mitigate temperature-induced variations during inference.

A 424 Kb SRAM-based CIM macro implementing the proposed techniques was fabricated in TSMC 28 nm HPC+RF CMOS technology. Measurement results demonstrate energy efficiency ranging from 196.6 to 102 tera-operations per second per watt per bit (TOPS/W/b) across precision modes from 2 to 10 bits. The subtract-mirror-amplify circuitry enabled an alternative configuration that achieved higher energy efficiency, approximately 24.14% and 32.57% improvements for 8- and 9-bit outputs, respectively, compared to approaches that solely tuned the word-line voltage applied to the compute cell. Additionally, conventional FRQ operations were made more energy-efficient by leveraging the amplification component to handle network data sparsity, yielding energy gains of 8.12% to 11.11% compared to solely tuning the word-line voltage. Overall, the macro maintains <1% inference accuracy degradation on standard AI benchmarks, including MNIST, CIFAR-10, CIFAR-100, and IMAGENET. These results highlight the effectiveness of the proposed output MAC precision-reconfigurable macro for edge AI applications.