All Computer Science and Engineering Research

Throughput-optimal systolic arrays from recurrence equations

Arpith C. Jacob, Washington University in St Louis
Jeremy D. Buhler, Washington University in St LouisFollow
Roger D. Chamberlain, Washington University in St LouisFollow

Document Type

Technical Report

Department

Computer Science and Engineering

Publication Date

2009

Filename

wucse-2009-39.pdf

Technical Report Number

wucse-2009-39

Abstract

Many compute-bound software kernels have seen order-of-magnitude speedups on special-purpose accelerators built on specialized architectures such as field-programmable gate arrays (FPGAs). These architectures are particularly good at implementing dynamic programming algorithms that can be expressed as systems of recurrence equations, which in turn can be realized as systolic array designs. To efficiently find good realizations of an algorithm for a given hardware platform, we pursue software tools that can search the space of possible parallel array designs to optimize various design criteria. Most existing design tools in this area produce a design that is latency-space optimal. However, we instead wish to target applications that operate on a large collection of small inputs, e.g. a database of biological sequences. For such applications, overall throughput rather than latency per input is the most important measure of performance. In this work, we introduce a new procedure to optimize throughput of a systolic array subject to resource constraints, in this case the area and bandwidth constraints of an FPGA device. We show that the throughput of an array is dependent on the maximum number of lattice points executed by any processor in the array, which to a close approximation is determined solely by the array’s projection vector. We describe a bounded search process to find throughput-optimal projection vectors and a tool to perform automated design space exploration, discovering a range of array designs that are optimal for inputs of different sizes. We apply our techniques to the Nussinov RNA folding algorithm to generate multiple mappings of this algorithm into systolic arrays. By combining our library of designs with run-time reconfiguration of an FPGA device to dynamically switch among them, we predict significant speedup over a single, latency-space optimal array.

Comments

Permanent URL: http://dx.doi.org/10.7936/K7862DPG

Recommended Citation

Jacob, Arpith C.; Buhler, Jeremy D.; and Chamberlain, Roger D., "Throughput-optimal systolic arrays from recurrence equations" Report Number: wucse-2009-39 (2009). All Computer Science and Engineering Research.
https://openscholarship.wustl.edu/cse_research/17

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

DOI

https://doi.org/10.7936/K7862DPG

All Computer Science and Engineering Research

Throughput-optimal systolic arrays from recurrence equations

Document Type

Department

Publication Date

Filename

Technical Report Number

Abstract

Comments

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

All Computer Science and Engineering Research

Throughput-optimal systolic arrays from recurrence equations

Authors

Document Type

Department

Publication Date

Filename

Technical Report Number

Abstract

Comments

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner