Abstract

Segmentation of organs and lesions in medical images is a critical component of clinical workflows, supporting tasks such as diagnosis, prognosis, and treatment planning. However, manual segmentation is labor-intensive, time-consuming, and prone to human error, motivating the development of automated segmentation methods. Among these, deep learning–based approaches have gained substantial attention in recent years due to their ability to learn rich, data-driven representations. Two widely used architectures are convolutional neural networks (CNNs) and vision transformers (ViTs). Despite their success, these models face challenges in multi-organ segmentation from volumetric medical images due to inherent architectural limitations. Furthermore, their clinical deployment is hindered by performance degradation when applied to unseen domains or imaging modalities—a problem known as domain shift. Domain shift substantially limits the direct transferability of segmentation models across datasets without explicit adaptation. Recently, Medical Vision Foundation Models (Med-VFMs) have emerged, offering strong performance by leveraging prior knowledge acquired through large-scale self-supervised pretraining. These models can be adapted to segmentation tasks via fine-tuning on a small set of labeled samples from the target domain. However, existing studies have not investigated effective strategies for selecting these samples, leaving open the question of how to adapt Med-VFMs most efficiently to new domains. To address these challenges, this dissertation introduces four methods designed to achieve two specific aims: (i) developing automatic multi-organ segmentation models from CT images and (ii) cross-domain adaptation and evaluation. (1) First, attention mechanisms and multi-scale convolutional kernels are integrated into CNNs to improve their ability to capture both global contextual information and multi-scale features, thereby enhancing performance in segmenting organs with substantial inter-patient variability in shape and size. (2) Second, large-kernel convolutional layers are incorporated into ViTs to enrich their capacity for capturing fine-grained localization cues, which in turn improves the delineation of adjacent organs and the segmentation of small structures by more accurately identifying boundary regions. The inherent architecture of ViTs facilitates the utilization of global information, thus further enhancing their ability to preserve anatomical coherence when segmenting organs with considerable shape variability. (3) Third, an Active Source-free Cross-domain and Cross-modality Adaptation framework is proposed to adapt segmentation models across different domains and modalities. This method employs active learning (AL) to selectively query informative target-domain samples using a novel Active Test-Time Sample Query strategy to guide model optimization. (4) Finally, an Active Selective Semi-supervised Fine-tuning approach is proposed to efficiently adapt Med-VFMs for volumetric medical image segmentation. This method also leverages AL to identify the most informative target-domain samples for fine-tuning without requiring access to the original source data, thereby maximizing performance with minimal annotation cost.

Committee Chair

Aristeidis Sotiras

Committee Members

Daniel Marcus

Degree

Doctor of Philosophy (PhD)

Author's Department

Interdisciplinary Programs

Author's School

McKelvey School of Engineering

Document Type

Dissertation

Date of Award

11-10-2025

Language

English (en)

Available for download on Saturday, November 07, 2026

Share

COinS