Date of Award
Doctor of Philosophy (PhD)
In the past decade, advancement of genotyping technology, first microarray then “next-generation” sequencing, has enabled scientists to examine the susceptible genes that contribute to the risk of complex disorders using a genome-wide, “hypothesis free” strategy. However, despite this “hypothesis free” label, these genome-wide approaches (including genome-wide association and whole genome sequencing studies) depend on two implicit assumptions. The first assumption is that the genetic risk of complex traits is contributed by independent genes/variants (assumption of independence).The second assumption is that different genes have equal potentiality to confer to the genetic predisposition of the complex traits (assumption of equality). Despite the huge success in susceptible gene association mapping in the last decade, more and more evidence has indicated that these two underlying assumptions of these genome-wide approaches may not be sound. Other than just studying one locus at a time, alternative methods which can carry out global analyses of biological molecules in populations have been developed to understand the influence of the whole biological system on complex traits. Network based approaches, in particular, have proven informative.
This dissertation will cover a few important issues concerning sequencing based study design and its applications in chapter II, III and IV. Human protein-protein interaction network will be constructed and a few of human gene network related issues will be studied and discussed in chapter V and VI. Abstracts for each chapter were summarized as followed.
Chapter 2: In this chapter, we proposed a two-stage, gene-based method for association mapping of rare variants by applying four different non-collapsing algorithms. Using the Genome Analysis Workshop 18 whole genome sequencing dataset of simulated blood pressure phenotypes, we studied and contrasted the false positive rate of each algorithm using receiver operating characteristic curves. The statistical power of these methods was also evaluated and compared through the analysis of 200 simulated replications in a smaller genotype data set. We showed that the Fisher’s method was superior to the other three 3 non-collapsing methods, but was no better than the standard method implemented with famSKAT.
Chapter 3: In this chapter, we aimed to identify potential susceptibility variants for bipolar disorder via the combination of exome sequencing and linkage analysis on 6 related subjects from a four-generation family. Our study identified a list of five potential candidate genes for bipolar disorder. Among these five genes, GRID1 (Glutamate Receptor Delta-1 Subunit), which was previously reported to be associated with several psychiatric disorders and brain related traits, is of particular interest. Our findings suggest a potential role for these genes and the related rare variants in the onset and development of bipolar disorder in this one family.
Chapter 4: In this chapter, we investigated the potential of FMO genes to confer risk of nicotine dependence via deep targeted sequencing in 2,820 study subjects comprising of nicotine 1,583 dependents and 1,237 controls from European and African Americans. Specifically, we focused on the two genomic segments including FMO1, FMO3 and the pseudo gene FMO6P, and aimed to investigate the potential association between FMO genes and nicotine dependence. We identified different clusters of significant common variants in European (with most significant SNP rs6674596, P=0.0004, OR=0.67, MAF_EA=0.14) and African Americans (with the most significant SNP rs6608453, P=0.001, OR=0.64, MAF_AA=0.1). Most of the significant variants identified were SNPs located within intronic regions or with unknown functional significance.
Chapter 5: In this chapter, we aimed to investigate the followed three scientific questions: 1) Can centrality reflect the biological significance of genes in a general human gene network? 2) Among these four commonly used centrality measures, does any of them outperform others? 3) Will they do better if we combine several centrality measures together using machine learning algorithms? To answer these scientific questions, we constructed a comprehensive human gene-gene network using protein-protein interaction data. Four essential gene sets were extracted from a variety of data sources serving as true answers in the evaluation and optimization process. Our analytic results indicated that there is a connection between the essentiality and centrality of human genes. A pattern of strong correlations was identified among the four commonly used centrality measures for a general human PPI network and the performance of each centrality measure was similar to others serving as predictors of the essentiality of genes. The improvement of the prediction models was limited when we combined several different centrality measures.
Chapter 6: In this chapter, we aimed to investigate the potential enrichment pattern in centrality of susceptible genes for certain complex disorders in a functional specific sub-network. Gene expression data of human brain tissue recorded in the Human Protein Atlas were extracted and utilized to construct a series of brain function specific sub-networks. Susceptible genes from three categories of complex disorders, including neurodegenerative disorder, psychiatric disorder and non-brain related disorder, were extracted from the GWAS catalogue. We identified a significant enrichment pattern of high centrality of susceptibility genes contributing to neurodegenerative and psychiatric disorders in these sub-networks. Our findings indicate that susceptibility genes of complex disorder might have higher centralities in functional specific sub-networks.
Chair and Committee
John P. Rice, Christina A. Gurnett
Laura J. Bierut, Donald F. Conrad, Nan Lin, Nancy L. Saccone
Zhang, Tianxiao, "Gene Association Mapping in the Era of Next-Generation Sequencing and Systems Biology" (2016). Arts & Sciences Electronic Theses and Dissertations. 909.