Date of Award
Spring 5-15-2023
Degree Name
Master of Science (MS)
Degree Type
Thesis
Abstract
Understanding which gene/pathway expression profiles are related to specific disease phenotypes has been a critical active research area in Bioinformatics. Although graph neural networks (GNNs) have achieved impressive performance on various graph-based real-world applications such as recommendation systems and social network analysis, applying GNNs in gene-network-based Bioinformatical tasks is still challenging due to the effectiveness issue and lack of interpretation methods. In this paper, we propose PathFormer, an interpretable graph Transformer (i.e. GNN), to effectively analyze gene networks and discover meaningful biomarkers/pathways. PathFormer is composed of a stack of PathFormer encoder layers and two subsequent interpretation machines. The PathFormer encoder layer is constructed upon the global attention mechanism, where a novel positional encoding scheme is proposed to enhance the model expressivity and the pathway message is incorporated in the attention matrix computation. On the other hand, the proposed interpretation machines leverage topological information and pathway message to identify core sub-gene networks of significant biomarkers and pathways through the top-K selection strategy. We apply the PathFormer model to the notorious Alzheimer's disease (AD) classification task. Experiments are performed on two independent AD datasets: Mayo and Rosmap, and empirical results show that our proposed PathFormer model significantly outperforms strong baselines, including state-of-the-art GNNs and graph Transformers. On average, the Pathformer model successfully increases the prediction accuracy of 33% and 55% over the best existing GNN and interpretable GNN. Furthermore, the interpretation machines in PathFormer can provide an instance-level explanation (i.e. personalized explanation) as well as a group-level explanation (i.e. population-based explanation), and experiments show that PathFormer can identify meaningful core gene subnetworks that consist of multiple reported AD-related genes and rational pathways.
Language
English (en)
Chair
Yixin Chen
Committee Members
Netanel Raviv, Cynthia Ma