"Statistical Aggregation: Theory and Applications" by Ruibin Xi

All Theses and Dissertations (ETDs)

Title

Statistical Aggregation: Theory and Applications

Author

Ruibin Xi, Washington University in St. LouisFollow

Author's School

Graduate School of Arts & Sciences

Author's Department/Program

Mathematics

Language

English (en)

Date of Award

January 2009

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Chair and Committee

Nan Lin

Abstract

Due to their size and complexity, massive data sets bring many computational challenges for statistical analysis, such as overcoming the memory limitation and improving computational efficiency of traditional statistical methods. In the dissertation, I propose the statistical aggregation strategy to conquer such challenges posed by massive data sets. Statistical aggregation partitions the entire data set into smaller subsets, compresses each subset into certain low-dimensional summary statistics and aggregates the summary statistics to approximate the desired computation based on the entire data. Results from statistical aggregation are required to be asymptotically equivalent. Statistical aggregation processes the entire data set part by part, and hence overcomes memory limitation. Moreover, statistical aggregation can also improve the computational efficiency of statistical algorithms with computational complexity at the order of O(Nm): m > 1) or even higher, where N is the size of the data. Statistical aggregation is particularly useful for online analytical processing: OLAP) in data cubes and stream data, where fast response to queries is the top priority. The &ldquo partition-compression-aggregation&rdquo strategy in statistical aggregation actually has been considered previously for OLAP computing in data cubes. But existing research in this area tends to overlook the statistical property of the analysis and aims to obtain identical results from aggregation, which has limited the application of this strategy to very simple analyses. Statistical aggregation instead can support OLAP in more sophisticated statistical analyses. In this dissertation, I apply statistical aggregation to two large families of statistical methods, estimating equation: EE) estimation and U-statistics, develop proper compression-aggregation schemes and show that the statistical aggregation tremendously reduces their computational burden while maintaining their efficiency. I further apply statistical aggregation to U-statistic based estimating equations and propose new estimating equations that need much less computational time but give asymptotically equivalent estimators.

Comments

Permanent URL: http://dx.doi.org/10.7936/K7CR5RF2

Recommended Citation

Xi, Ruibin, "Statistical Aggregation: Theory and Applications" (2009). All Theses and Dissertations (ETDs). 388.
https://openscholarship.wustl.edu/etd/388

Download

COinS

DOI

https://doi.org/10.7936/K7CR5RF2

All Theses and Dissertations (ETDs)

Title

Statistical Aggregation: Theory and Applications

Author

Author's School

Author's Department/Program

Language

Date of Award

Degree Type

Degree Name

Chair and Committee

Abstract

Comments

Recommended Citation

DOI

Search

Browse

Author Corner

All Theses and Dissertations (ETDs)

Title

Statistical Aggregation: Theory and Applications

Author

Author's School

Author's Department/Program

Language

Date of Award

Degree Type

Degree Name

Chair and Committee

Abstract

Comments

Recommended Citation

Share

DOI

Search

Browse

Author Corner