Technical Report Number
This paper presents the analysis of an improved distributed checkpointing algorithm. It shows that the message volume of Koo and Toueg's distributed checkpointing algorithm approaches 3fN for large checkpoint intervals where N is the number of processes and processes randomly send messages to f other processes. Thus, the average mesage volume is O(n2). We show how Koo and Toueg's algorithm can be modified so as to avoid this O9n2) overhead and derive an accurate estimate of the message volume. The overhead is reduced by using dependency knowledge to substantially reduce the average message volume.
Garg, Sachin and Wong, Kenneth F., "Analysis of an Improved Distributed Checkpointing Algorithm" Report Number: WUCS-93-37 (1993). All Computer Science and Engineering Research.
Permanent URL: http://dx.doi.org/10.7936/K7513WKN