Date of Award

Spring 5-15-2016

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Human & Statistical Genetics)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Genotype imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. This body of work focuses on assessing imputation accuracy and uses imputed data to identify genetic contributors to mentholated cigarette preference.

Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohens kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N =1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants. This work directly impacts the interpretation of association studies by improving our understanding of accuracy assessments of imputed variants.

Mentholated cigarettes are addictive, widely available, and commonly used, particularly by African American smokers. We aim to identify genetic variants that increase susceptibility to mentholated cigarette use in hopes of gaining biological insights into risk that may ultimately improve cessation efforts. We begin by pursuing hypothesis-driven candidate genes and regions (TAS2R38, CHRNA5/A3/B4, CHRNB3/A6, and CYP2A6/A7) and extend to a genome-wide approach. This study involves 1,365 African Americans and 2,206 European Americans (3,571 combined ancestry) nicotine dependent current smokers from The Collaborative Genetic Study of Nicotine Dependence (COGEND) and Transdisciplinary Tobacco Use Research Center (UW-TTURC). Analyses were conducted within each cohort, and meta-analysis was used to combine results across studies and across ancestral groups. We identified some suggestively associated variants, although none meet genome wide significance. This study represents a new, important aspect to understanding menthol cigarette preference. Further work is necessary to better understand this smoking behavior in efforts to improve cessation.


English (en)

Chair and Committee

Nancy Saccone

Committee Members

John Rice, Barak Cohen, Arpana Agarwal, Elisha Roberson,


Permanent URL: