Abstract

The rapid growth of data processing required in various arenas of computation over the past decades necessitates extensive use of parallel computing engines. Among those, highly-threaded many-core machines, such as GPUs have become increasingly popular for accelerating a diverse range of data-intensive applications. They feature a large number of hardware threads with low-overhead context switches to hide the memory access latencies and therefore provide high computational throughput. However, understanding and harnessing such machines places great challenges on algorithm designers and performance tuners due to the complex interaction of threads and hierarchical memory subsystems of these machines. The achieved performance jointly depends on the parallelism exploited by the algorithm, the effectiveness of latency hiding, and the utilization of multiprocessors (occupancy). Contemporary work tries to model the performance of GPUs from various aspects with different emphasis and granularity. However, no model considers all of these factors together at the same time.

This dissertation presents an analytical framework that jointly addresses parallelism, latency-hiding, and occupancy for both theoretical and empirical performance analysis of algorithms on highly-threaded many-core machines so that it can guide both algorithm design and performance tuning. In particular, this framework not only helps to explore and reduce the runtime configuration space for tuning kernel execution on GPUs, but also reflects performance bottlenecks and predicts how the runtime will trend as the problem and other parameters scale. The framework consists of a pair of analytical models with one focusing on higher-level asymptotic algorithm performance on GPUs and the other one emphasizing lower-level details about scheduling and runtime configuration. Based on the two models, we have conducted extensive analysis of a large set of algorithms. Two analysis provides interesting results and explains previously unexplained data. In addition, the two models are further bridged and combined as a consistent framework. The framework is able to provide an end-to-end methodology for algorithm design, evaluation, comparison, implementation, and prediction of real runtime on GPUs fairly accurately.

To demonstrate the viability of our methods, the models are validated through data from implementations of a variety of classic algorithms, including hashing, Bloom filters, all-pairs shortest path, matrix multiplication, FFT, merge sort, list ranking, string matching via suffix tree/array, etc. We evaluate the models' performance across a wide spectrum of parameters, data values, and machines. The results indicate that the models can be effectively used for algorithm performance analysis and runtime prediction on highly-threaded many-core machines.

Committee Chair

Tao Ju

Committee Members

Kunal Agrawal, Roger Chamberlain, James Buckley, Jeremy Buhler

Comments

Permanent URL: https://doi.org/10.7936/K73X84SG

Degree

Doctor of Philosophy (PhD)

Author's Department

Computer Science & Engineering

Author's School

McKelvey School of Engineering

Document Type

Dissertation

Date of Award

Winter 12-15-2014

Language

English (en)

DOI

https://doi.org/10.7936/K73X84SG

Recommended Citation

Ma, Lin, "Modeling Algorithm Performance on Highly-threaded Many-core Architectures" (2014). McKelvey School of Engineering Theses & Dissertations. 63.

The definitive version is available at https://doi.org/10.7936/K73X84SG

Download

Included in

Engineering Commons

COinS

DOI

https://doi.org/10.7936/K73X84SG

McKelvey School of Engineering Theses & Dissertations

Modeling Algorithm Performance on Highly-threaded Many-core Architectures

Abstract

Committee Chair

Committee Members

Comments

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Theses & Dissertations

Modeling Algorithm Performance on Highly-threaded Many-core Architectures

Author

Abstract

Committee Chair

Committee Members

Comments

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner