Date of Award

Winter 12-15-2014

Author's Department

Computer Science & Engineering

Degree Name

Doctor of Philosophy (PhD)

Degree Type

Dissertation

Abstract

Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to choose simpler, less powerful models, e.g. linear models, in order to process more training examples in a limited time. In this work, we introduce parallelism to the training of non-linear models by leveraging a different tradeoff--approximation. We demonstrate various techniques by which non-linear models can be made amenable to larger data sets and significantly more training parallelism by strategically introducing approximation in certain optimization steps.

For gradient boosted regression tree ensembles, we replace precise selection of tree splits with a coarse-grained, approximate split selection, yielding both faster sequential training and a significant increase in parallelism, in the distributed setting in particular. For metric learning with nearest neighbor classification, rather than explicitly train a neighborhood structure we leverage the implicit neighborhood structure induced by task-specific random forest classifiers, yielding a highly parallel method for metric learning. For support vector machines, we follow existing work to learn a reduced basis set with extremely high parallelism, particularly on GPUs, via existing linear algebra libraries.

We believe these optimization tradeoffs are widely applicable wherever machine learning is put in practice in large scale settings. By carefully introducing approximation, we also introduce significantly higher parallelism and consequently can process more training examples for more iterations than competing exact methods. While seemingly learning the model with less precision, this tradeoff often yields noticeably higher accuracy under a restricted training time budget.

Language

English (en)

Chair

Kunal Agrawal

Committee Members

Roger Chamberlain, Robert Pless

Comments

Permanent URL: https://doi.org/10.7936/K70863FB

Download

Included in

Engineering Commons

COinS

DOI

https://doi.org/10.7936/K70863FB

McKelvey School of Engineering Theses & Dissertations

Approximation and Relaxation Approaches for Parallel and Distributed Machine Learning

Date of Award

Author's Department

Degree Name

Degree Type

Abstract

Language

Chair

Committee Members

Comments

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Theses & Dissertations

Approximation and Relaxation Approaches for Parallel and Distributed Machine Learning

Author

Date of Award

Author's Department

Degree Name

Degree Type

Abstract

Language

Chair

Committee Members

Comments

Included in

Share

DOI

Search

Links

Browse

Author Corner