Author's School

Graduate School of Arts & Sciences

Author's Department/Program

Biology and Biomedical Sciences: Human and Statistical Genetics


English (en)

Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Chair and Committee

John Rice


Since the discovery of Mendel's laws, one of the most challenging problems in genetic research has been to locate and characterize genetic variants that cause human disease. Although thousands of disease-associated genetic variants have been discovered, many remain unknown. New methods are needed to facilitate the discovery process. Here, we present new methodology to improve detection of these genetic variants for genotyping imputation, Copy Number Variations: CNV) and sequencing data. Currently, imputation is widely used to evaluate the evidence for association at genetic markers that are not directly-genotyped. However, imputation can be problematic especially when a genetic variant has low minor allele frequency. We present a new statistic, the imputation quality score, developed to better differentiate well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and datasets that are genotyped on different platforms. CNV calling, on the other hand, is not reliable. We developed a statistical method for estimating sensitivity and positive predictive rate, and evaluated the relative performance of CNV calling on a genome wide scale. We found that the positive predictive rate increases with the number of probes and the size of CNVs. We also noticed that CNVs reported by multiple programs have a higher reproducibility rate and positive predictive rate. This method was applied to the dataset from the Study of Addiction: Genetics and Environment. Our analysis revealed that CNVs in 6q14.1: P= 1.04 x10-6) and 5q13.2: P= 3.37 x10 -4) are significantly associated with alcohol dependence after adjusting for multiple tests. Evidence also suggested that CNVs at 5q13.2 increase the risk for alcohol dependence by lowering conscientiousness, or more specifically, self-discipline. As genetics is looking towards the future with sequencing data, improved methods are needed for rare variants. By taking advantage of the simulation data from the Genetic Analysis Workshop, we integrated both the collapsing method and the family data method in an attempt to increase power for rare variants. We concluded that this combinational approach offers a substantial power boost for certain causal genes, and is therefore worth further investigation. By focusing on improving the interpretation of data from imputation, CNV calling and sequencing, our work parallels the development of genetic research over the past few years, provides a direction for on-going methods development, and will be useful for future research endeavors.



Permanent URL: