Date of Award
Doctor of Philosophy (PhD)
In this dissertation, we consider two cases of complex dependent data, i.e., high-dimensional data and longitudinal data.
We study two topics of high-dimensional data. One is a basic problem in modern high- dimensional data analysis, that is, testing the equality of two mean vectors in settings where the dimension p increases with the sample size n. We propose a robust two-sample test for high-dimensional data against sparse and strong alternatives, in which the mean vectors of the populations differ in only a few dimensions, but the magnitude of the differences is large. The test is based on trimmed means and robust precision matrix estimators. The asymptotic joint distribution of the trimmed means is established, and the proposed test statistic is shown to have a Gumbel distribution in the limit. Simulation studies suggest that the numerical performance of the proposed test is comparable to that of non-robust tests for uncontaminated data. For cell-wise contaminated data, it outperforms non-robust tests. An illustration involves biomarker identification in an Alzheimer’s disease dataset. The other topic of high-dimensional data is detecting DNA differential methylation in the analysis of next-generation sequencing technologies. We propose a likelihood ratio test using both whole- genome bisulfite sequencing (WGBS) data and Tet-assisted bisulfite sequencing (TAB-seq) data to detect differential methylation at a CpG site. Its application to the genome-wide analysis of differential methylation profiling leads to a high-dimensional multiple testing problem. We adopt the widely used q-value method to control the false discovery rate.
For longitudinal data, we consider method agreement studies in medical and clinical fields, which often compare a cheaper, faster, or less invasive measuring method with a widely used one to see if they have sufficient agreement for interchangeable use. We propose a model- based approach to assess agreement of two measuring methods based on longitudinal data in the form of paired repeated binary measurements. Based upon the generalized linear mixed models (GLMM), the decision on the adequacy of interchangeable use is made by testing the equality of fixed effects of methods. Approaches for assessing method agreement, such as the Bland-Altman diagram and Cohen’s kappa, are also developed for repeated binary measurements based upon the latent variables in models. We assess our novel model-based approach by simulation studies and a real clinical application, in which patients are evaluated repeatedly after surgery for delirium with two validated screening methods.
Chair and Committee
Likai Chen, Jimin Ding, Jose Figueroa-Lopez, Ting Wang,
Wang, Wei, "Three Essays on Complex Dependent Data" (2019). Arts & Sciences Electronic Theses and Dissertations. 1865.
Available for download on Monday, May 15, 2119