Abstract
Parallel and distributed systems are pervasive, such as web services, clouds, and cyber-physical systems. We often desire high throughput and small latency in the parallel and distributed system. However, since the system is distributed and the input is online, scheduling for high throughput while keeping the latency small is often challenging. In this dissertation, we developed scheduling algorithms, policies, and mechanisms to approach high throughput with small latency in various parallel and distributed applications. First, we developed AMCilk runtime system for running multi-programmed parallel jobs on many-processor machines. When running parallel jobs, the allocation of processors to the parallel jobs often changes over time. Since AMCilk conducts responsive reallocation of processors between parallel jobs, various applications achieve significant throughput boosts and latency reductions compared to existing solutions. Second, we built static schedules for the real-time transmission of messages in shared communication media. Medium errors can cause transmission failures, and they are unpredictable. Since our schedules tolerate medium errors with low overhead, real-time applications can transmit messages at high throughput. Finally, we analyzed various data-placement strategies for distributed key-value stores. In distributed key-value stores, a large amount of data is stored across multiple servers. When a request to access data arrives, it is routed to the appropriate server, queued, and eventually processed. If the queue is full, then requests can be rejected. We analyze various data-placement strategies and compare their capacities of avoiding rejections and achieving high throughput with small queue sizes.
Committee Chair
Sanjoy Baruah
Committee Members
Kunal Agrawal, I-Ting Lee, Christopher Gill, Jeremy Fineman,
Degree
Doctor of Philosophy (PhD)
Author's Department
Computer Science & Engineering
Document Type
Dissertation
Date of Award
Summer 8-15-2022
Language
English (en)
DOI
https://doi.org/10.7936/3k0h-pr74
Recommended Citation
Wang, Zhe, "Scheduling for High Throughput and Small Latency in Parallel and Distributed Systems" (2022). McKelvey School of Engineering Theses & Dissertations. 806.
The definitive version is available at https://doi.org/10.7936/3k0h-pr74