Date of Award

Winter 12-15-2021

Author's School

McKelvey School of Engineering

Author's Department

Computer Science & Engineering

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Accurate delineation of medical images is crucial for computer-aided diagnosis and treatment. However, the current clinical practice primarily relies on segmentation produced manually by physicians, which is time-consuming and labor-intensive. The manual segmentation results are also affected by physicians' diverse expertise and experience. Hence, reliable automated, consistent, and accurate segmentation methods present excellent clinical value. Modern machine learning techniques, particularly deep learning, perform exceptionally well on various computer vision tasks. The most representative models for medical image segmentation are based on convolutional neural networks (CNN). Nevertheless, they are far from perfection for direct clinical applications.

Three research problems are addressed in this dissertation. The first two problems are regarding the limitations of CNN architectures: 1) CNN models do not explicitly capture spatial contiguity, and 2) CNN models do not explicitly or efficiently model long-range dependencies. The third problem is insufficient training data for developing effective medical image segmentation models.

To address the first problem, we developed an adversarial CNN model with Markov random field enhancements, effectively modeling three components of spatial contiguity: unary, pairwise, and high-order relationship among pixels in a medical image. The proposed model achieved state-of-the-art performance on pelvic CT segmentation tasks, and multiple design upgrades can benefit future CNN development.

To address the second problem, we developed two CNN-attention hybrid models to capture long-range dependencies in segmentation models. The first model, weaving attention U-net, integrated self-attention blocks in multiple levels of CNN models, efficiently modeling long-range dependencies at multiple resolution levels. The second model, pyramid medical transformer, was the first multi-scale transformer for medical image segmentation, in which we advanced the rigid patching strategy in all existing vision transformers. These two models efficiently combined the merits of CNN and self-attention, achieving state-of-the-art performance in four public datasets. We took two directions to address the lack-of-annotation problem faced by medical image applications. We started with developing a semi-supervised adversarial segmentation model that can learn from unannotated images. The availability of these unannotated images is guaranteed by generative adversarial networks (GAN) synthesis. Subsequently, we proposed a contrastive self-supervised learning (SSL) method tailored to downstream segmentation tasks. The method was the first SSL method that works on pixel-level fine-grained local consistency, presenting excellent representation ability and preparing the CNN models for dense prediction downstream tasks. The proposed SSL model also supported semi-supervised learning and few-shot fine-tuning. These two solutions alleviated the lack-of-annotation bottleneck in many medical imaging applications.


English (en)


Chenyang Lu, Baozhou Sun, Deshan Yang, Brendan Juba

Committee Members

Chenyang Lu, Baozhou Sun, Deshan Yang, Brendan Juba

Available for download on Monday, January 04, 2027