Abstract
With the advent of large-scale, readily accessible neuroimaging datasets enabled by advances in imaging technology and rapidly expanding computational resources, there remains a pressing need for analysis methods that scale efficiently and can effectively characterize patterns of covariance and heterogeneity in brain structure, function, and disease. Data-driven, hypothesis-free approaches promote unbiased discovery, support the analysis of large and complex datasets, and can reveal novel structures or patterns that complement and guide future hypothesis-driven research. Among these approaches, nonnegative matrix factorization (NMF) has shown particularly promising results in neuroimaging. NMF identifies patterns of covariance that are spatially localized and intuitively interpretable, making it especially suitable for neuroimaging applications compared to other dimensionality reduction methods. However, these benefits often come with high computational cost and challenges in adapting NMF to highly heterogeneous data. To address limitations in the usability and scalability of orthonormal projective NMF (opNMF) for neuroimaging, we propose methods to reduce memory footprint and computational cost, enabling analyses that were previously infeasible on conventional hardware. In the first part, we introduce techniques that mitigate the large-memory requirements of opNMF by leveraging QR decomposition and singular value decomposition, replicating the biologically meaningful outputs of opNMF on 1,000 T1-weighted magnetic resonance imaging (MRI) scans. In the second part, we employ graph coarsening to reproduce opNMF on surface-based neuroimaging data with reduced computational cost. We demonstrate the feasibility of large-scale opNMF applications while preserving interpretability using cortical thickness data from 5,992 UK Biobank subjects to investigate the association between gray-matter atrophy and visceral fat. In the final part, we extend opNMF to study heterogeneity by simultaneously performing NMF on features while clustering subjects. Validation using simulated data shows that this joint approach outperforms conventional k-means clustering in distinguishing two subtypes. Overall, these adaptations improve the suitability of NMF for neuroimaging applications. By reducing computational burden and accommodating heterogeneous data, the proposed methods are particularly valuable for analyzing multimodal or longitudinal datasets in large-scale neuroimaging studies.
Committee Chair
Aristeidis Sotiras
Committee Members
Adam Bauer; Deanna Barch; Matthew Glasser; Nico Dosenbach
Degree
Doctor of Philosophy (PhD)
Author's Department
Interdisciplinary Programs
Document Type
Dissertation
Date of Award
2-2-2026
Language
English (en)
Recommended Citation
Ha, Sung Min, "Interpretable, Scalable, Unsupervised Machine Learning for Large Neuroimaging Data" (2026). McKelvey School of Engineering Theses & Dissertations. 1332.
The definitive version is available at https://doi.org/10.7936/fx28-t541