Abstract

With the advent of large-scale, readily accessible neuroimaging datasets enabled by advances in imaging technology and rapidly expanding computational resources, there remains a pressing need for analysis methods that scale efficiently and can effectively characterize patterns of covariance and heterogeneity in brain structure, function, and disease. Data-driven, hypothesis-free approaches promote unbiased discovery, support the analysis of large and complex datasets, and can reveal novel structures or patterns that complement and guide future hypothesis-driven research. Among these approaches, nonnegative matrix factorization (NMF) has shown particularly promising results in neuroimaging. NMF identifies patterns of covariance that are spatially localized and intuitively interpretable, making it especially suitable for neuroimaging applications compared to other dimensionality reduction methods. However, these benefits often come with high computational cost and challenges in adapting NMF to highly heterogeneous data. To address limitations in the usability and scalability of orthonormal projective NMF (opNMF) for neuroimaging, we propose methods to reduce memory footprint and computational cost, enabling analyses that were previously infeasible on conventional hardware. In the first part, we introduce techniques that mitigate the large-memory requirements of opNMF by leveraging QR decomposition and singular value decomposition, replicating the biologically meaningful outputs of opNMF on 1,000 T1-weighted magnetic resonance imaging (MRI) scans. In the second part, we employ graph coarsening to reproduce opNMF on surface-based neuroimaging data with reduced computational cost. We demonstrate the feasibility of large-scale opNMF applications while preserving interpretability using cortical thickness data from 5,992 UK Biobank subjects to investigate the association between gray-matter atrophy and visceral fat. In the final part, we extend opNMF to study heterogeneity by simultaneously performing NMF on features while clustering subjects. Validation using simulated data shows that this joint approach outperforms conventional k-means clustering in distinguishing two subtypes. Overall, these adaptations improve the suitability of NMF for neuroimaging applications. By reducing computational burden and accommodating heterogeneous data, the proposed methods are particularly valuable for analyzing multimodal or longitudinal datasets in large-scale neuroimaging studies.

Committee Chair

Aristeidis Sotiras

Committee Members

Adam Bauer; Deanna Barch; Matthew Glasser; Nico Dosenbach

Degree

Doctor of Philosophy (PhD)

Author's Department

Interdisciplinary Programs

Author's School

McKelvey School of Engineering

Document Type

Dissertation

Date of Award

2-2-2026

Language

English (en)

DOI

https://doi.org/10.7936/fx28-t541

Recommended Citation

Ha, Sung Min, "Interpretable, Scalable, Unsupervised Machine Learning for Large Neuroimaging Data" (2026). McKelvey School of Engineering Graduate Student Theses & Dissertations. 1332.

The definitive version is available at https://doi.org/10.7936/fx28-t541

Download

Available for download on Thursday, January 27, 2028

Included in

Engineering Commons

COinS

DOI

https://doi.org/10.7936/fx28-t541

McKelvey School of Engineering Graduate Student Theses & Dissertations

Interpretable, Scalable, Unsupervised Machine Learning for Large Neuroimaging Data

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Graduate Student Theses & Dissertations

Interpretable, Scalable, Unsupervised Machine Learning for Large Neuroimaging Data

Author

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner