Abstract

Feature extraction and selection in the presence of nonlinear dependencies among the data is a fundamental challenge in unsupervised learning. We propose using a Gram-Schmidt (GS) type orthogonalization process over function spaces to detect and map out such dependencies. Specifically, by applying the GS process over some family of functions, we construct a series of covariance matrices that can either be used to identify new large-variance directions, or to remove those dependencies from known directions. In the former case, we provide information-theoretic guarantees in terms of entropy reduction. In the latter, we provide precise conditions by which the chosen function family eliminates existing redundancy in the data. Each approach provides both a feature extraction and a feature selection algorithm. Our feature extraction methods are linear, and can be seen as natural generalization of principal component analysis (PCA). We provide experimental results for synthetic and real-world benchmark datasets which show superior performance over state-of-the-art (linear) feature extraction and selection algorithms. Surprisingly, our linear feature extraction algorithms are comparable and often outperform several important nonlinear feature extraction methods such as autoencoders, kernel PCA, and UMAP. Furthermore, one of our feature selection algorithms strictly generalizes a recent Fourier-based feature selection mechanism, yet at significantly reduced complexity. Beyond benchmark evaluations, we also investigated the performance of our methods in demanding real-world scientific settings. In particular, our proposed feature selection algorithm was applied to large-scale genomics datasets to remove irrelevant or weakly informative genes. These datasets are characterized by extremely high dimensionality and pronounced sparsity, conditions under which many existing feature selection algorithms exhibit degraded performance. Despite these challenges, our proposed algorithm demonstrated robust and superior behavior, reliably identifying biologically relevant features. This not only improved downstream data quality and model interpretability, but also led to more stable and biologically meaningful insights in tasks such as clustering and gene ontology analysis. These results highlight the practical utility of our proposed feature selection framework in large-scale scientific domains where sparsity and nonlinear structure are inherent. Finally, our feature selection framework enables a principled approach to interpreting deep neural networks. Modern high-capacity models achieve impressive predictive performance, yet their lack of transparency remains a critical barrier in domains where reliability and accountability are essential. Existing attribution techniques often rely on gradients, making them susceptible to noise and instability, or they involve computationally demanding optimization procedures. Moreover, many do not incorporate systematic mechanisms for identifying informative and non-redundant internal representations. By leveraging the same Gram-Schmidt orthogonalization over function spaces, we construct a set of orthogonalized features that disentangle the model’s learned representations. Extensive experiments on ILSVRC2012 and PASCAL VOC 2007 across multiple CNN architectures (VGG-16, Inception-v3, ResNeXt-50, and DeiT-Base) demonstrate superior performance compared to other state-of-the-art methods.Our approach offers a principled solution that combines the robustness of gradient-free methods with computational efficiency, while providing unprecedented interpretability through feature analysis.

Committee Chair

Bruno Sinopoli

Committee Members

Joseph O'Sullivan; Netanel Raviv; Yang Li; Yiannis Kantaros

Degree

Doctor of Philosophy (PhD)

Author's Department

Electrical & Systems Engineering

Author's School

McKelvey School of Engineering

Document Type

Dissertation

Date of Award

4-15-2026

Language

English (en)

Available for download on Tuesday, June 15, 2027

Share

COinS