Date of Award

Spring 5-15-2023

Author's School

McKelvey School of Engineering

Author's Department

Computer Science & Engineering

Degree Name

Master of Science (MS)

Degree Type

Thesis

Abstract

Understanding which gene/pathway expression profiles are related to specific disease phenotypes has been a critical active research area in Bioinformatics. Although graph neural networks (GNNs) have achieved impressive performance on various graph-based real-world applications such as recommendation systems and social network analysis, applying GNNs in gene-network-based Bioinformatical tasks is still challenging due to the effectiveness issue and lack of interpretation methods. In this paper, we propose PathFormer, an interpretable graph Transformer (i.e. GNN), to effectively analyze gene networks and discover meaningful biomarkers/pathways. PathFormer is composed of a stack of PathFormer encoder layers and two subsequent interpretation machines. The PathFormer encoder layer is constructed upon the global attention mechanism, where a novel positional encoding scheme is proposed to enhance the model expressivity and the pathway message is incorporated in the attention matrix computation. On the other hand, the proposed interpretation machines leverage topological information and pathway message to identify core sub-gene networks of significant biomarkers and pathways through the top-K selection strategy. We apply the PathFormer model to the notorious Alzheimer's disease (AD) classification task. Experiments are performed on two independent AD datasets: Mayo and Rosmap, and empirical results show that our proposed PathFormer model significantly outperforms strong baselines, including state-of-the-art GNNs and graph Transformers. On average, the Pathformer model successfully increases the prediction accuracy of 33% and 55% over the best existing GNN and interpretable GNN. Furthermore, the interpretation machines in PathFormer can provide an instance-level explanation (i.e. personalized explanation) as well as a group-level explanation (i.e. population-based explanation), and experiments show that PathFormer can identify meaningful core gene subnetworks that consist of multiple reported AD-related genes and rational pathways.

Language

English (en)

Chair

Yixin Chen

Committee Members

Netanel Raviv, Cynthia Ma

Included in

Engineering Commons

Share

COinS