Date of Award
10-7-2024
Degree Name
Doctor of Philosophy (PhD)
Degree Type
Dissertation
Abstract
CMOS Image Sensors (CIS) are the most widely used visual devices in the modern sensing industry, playing a crucial role in converting light signals to digital images. In practical applications, CIS serve as the front end of a vision system, which also includes Image Signal Processors (ISPs) and vision algorithms before delivering the final output to end consumers. The quality of a vision system depends on the metrics used by the end consumer to evaluate the received images. In conventional vision systems, the end consumer is human, and the evaluation metric is image quality, which is based on visual fidelity and measured by factors such as peak signal-to-noise ratio and structural similarity index measure. To achieve higher visual-fidelity-based image quality, these systems allocate a significant portion of their hardware budget to high-resolution analog-to-digital conversions and advanced ISPs, resulting in large data transmission between CIS, ISPs, and end consumers, causing energy and latency bottlenecks in the vision system. Critically, these bottlenecks worsen as image resolution increases, which is a consequence of deploying large Computer Vision (CV) models in real-time scenarios as required by today’s edge-Artificial Intelligence (AI) applications. Therefore, the conventional human-centered vision system is unable to efficiently support edge-AI applications. This dissertation explores a new energy- and latency-efficient vision system based on the observation that in edge-AI applications, voluminous vision data are generated by intelligent edge CIS and consumed, not by humans, but by downstream CV algorithms to perform sophisticated tasks such as classification, recognition, and machine perception. The evaluation metric for the quality of the vision system thus becomes the accuracy of downstream tasks rather than visual-fidelity-based image quality. This observation motivates us to discard the concept of reconstructing high-fidelity images, but instead to compress raw images and preserve “task-specific” information to achieve energy and latency reduction without degrading downstream task accuracy. Such a vision system is termed a data-driven machine vision system and is constructed with high efficiency through a co-design paradigm of circuit, algorithm, and architecture. This dissertation analyzes the data-driven machine vision system by targeting different CV algorithms that extract the “task-specific” information and implementing these algorithms in CIS with circuit and architecture techniques to improve system efficiency. First, a vision system with a classic CV algorithm implemented in the sensor is presented, achieving 7.1× compression for the pedestrian detection task without accuracy loss. Next, two vision systems with deep learning-based CV algorithms implemented in the sensor are presented, achieving up to 8× compression for the image classification task and up to 20× compression for the eye-tracking task, respectively, with minimal task accuracy loss. Finally, frameworks for exploring computational CIS architectures and facilitating conventional CIS design space exploration are presented, forming the basis for systematically, rather than heuristically, choosing the optimal circuit and architecture for a given algorithm.
Language
English (en)
Chair
Xuan Zhang
Committee Members
Abhinav Jha; Joseph O’Sullivan; Shantanu Chakrabartty; Yuhao Zhu