Date of Award

Spring 5-15-2015

Author's Department

Computer Science & Engineering

Degree Name

Doctor of Philosophy (PhD)

Degree Type



The disparity in performance between processors and main memories has

led computer architects to incorporate large cache hierarchies in

modern computers. These cache hierarchies are designed to be

general-purpose in that they strive to provide the best possible

performance across a wide range of applications. However, such a memory

subsystem does not necessarily provide the best possible performance for

a particular application.

Although general-purpose memory subsystems are desirable when the

work-load is unknown and the memory subsystem must remain fixed,

when this is not the case a custom memory subsystem may be beneficial.

For example, in an application-specific integrated circuit (ASIC) or

a field-programmable gate array (FPGA) designed to run a particular

application, a custom memory subsystem optimized for that application

would be desirable. In addition, when there are tunable

parameters in the memory subsystem, it may make sense to change these

parameters depending on the application being run. Such a situation

arises today with FPGAs and, to a lesser extent, GPUs, and it is

plausible that general-purpose computers will begin to support

greater flexibility in the memory subsystem in the future.

In this dissertation, we first show that it is possible to create

application-specific memory subsystems that provide much better

performance than a general-purpose memory subsystem. In addition,

we show a way to discover such memory subsystems automatically using

a superoptimization technique on memory address traces gathered

from applications. This allows one to generate a custom memory subsystem

with little effort.

We next show that our memory subsystem superoptimization technique can

be used to optimize for objectives other than performance. As an example,

we show that it is possible to reduce the number of writes to the main

memory, which can be useful for main memories with limited write

durability, such as flash or Phase-Change Memory (PCM).

Finally, we show how to superoptimize memory subsystems for streaming

applications, which are a class of parallel applications. In particular, we

show that, through the use of ScalaPipe, we can author and deploy streaming

applications targeting FPGAs with superoptimized memory subsystems.

ScalaPipe is a domain-specific language (DSL) embedded in the Scala

programming language for generating streaming applications that can be

implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we

are able to demonstrate actual performance improvements using the

superoptimized memory subsystem with applications implemented in hardware.


English (en)


Roger D Chamberlain

Committee Members

Kunal Agrawal, Ron K Cytron, Viktor Gruev, Krishna Kavi, Hiro Mukai


Permanent URL:

Included in

Engineering Commons