Abstract
Segmentation of organs and lesions in medical images is a critical component of clinical workflows, supporting tasks such as diagnosis, prognosis, and treatment planning. However, manual segmentation is labor-intensive, time-consuming, and prone to human error, motivating the development of automated segmentation methods. Among these, deep learning–based approaches have gained substantial attention in recent years due to their ability to learn rich, data-driven representations. Two widely used architectures are convolutional neural networks (CNNs) and vision transformers (ViTs). Despite their success, these models face challenges in multi-organ segmentation from volumetric medical images due to inherent architectural limitations. Furthermore, their clinical deployment is hindered by performance degradation when applied to unseen domains or imaging modalities—a problem known as domain shift. Domain shift substantially limits the direct transferability of segmentation models across datasets without explicit adaptation. Recently, Medical Vision Foundation Models (Med-VFMs) have emerged, offering strong performance by leveraging prior knowledge acquired through large-scale self-supervised pretraining. These models can be adapted to segmentation tasks via fine-tuning on a small set of labeled samples from the target domain. However, existing studies have not investigated effective strategies for selecting these samples, leaving open the question of how to adapt Med-VFMs most efficiently to new domains. To address these challenges, this dissertation introduces four methods designed to achieve two specific aims: (i) developing automatic multi-organ segmentation models from CT images and (ii) cross-domain adaptation and evaluation. (1) First, attention mechanisms and multi-scale convolutional kernels are integrated into CNNs to improve their ability to capture both global contextual information and multi-scale features, thereby enhancing performance in segmenting organs with substantial inter-patient variability in shape and size. (2) Second, large-kernel convolutional layers are incorporated into ViTs to enrich their capacity for capturing fine-grained localization cues, which in turn improves the delineation of adjacent organs and the segmentation of small structures by more accurately identifying boundary regions. The inherent architecture of ViTs facilitates the utilization of global information, thus further enhancing their ability to preserve anatomical coherence when segmenting organs with considerable shape variability. (3) Third, an Active Source-free Cross-domain and Cross-modality Adaptation framework is proposed to adapt segmentation models across different domains and modalities. This method employs active learning (AL) to selectively query informative target-domain samples using a novel Active Test-Time Sample Query strategy to guide model optimization. (4) Finally, an Active Selective Semi-supervised Fine-tuning approach is proposed to efficiently adapt Med-VFMs for volumetric medical image segmentation. This method also leverages AL to identify the most informative target-domain samples for fine-tuning without requiring access to the original source data, thereby maximizing performance with minimal annotation cost.
Committee Chair
Aristeidis Sotiras
Committee Members
Daniel Marcus
Degree
Doctor of Philosophy (PhD)
Author's Department
Interdisciplinary Programs
Document Type
Dissertation
Date of Award
11-10-2025
Language
English (en)
DOI
https://doi.org/10.7936/58dy-8q33
Recommended Citation
Yang, Jin, "The Development of Deep-Learning-Based Automatic Multi-Organ Segmentation Models from CT Images and their Clinical Evaluation" (2025). McKelvey School of Engineering Theses & Dissertations. 1327.
The definitive version is available at https://doi.org/10.7936/58dy-8q33