Architecture-Directed Refinement

Authors: Gruia-Catalin Roman and C. Donald Wilcox

As critical computer systems continue to grow in complexity, the task of demonstrating that they are correct, that is, guaranteed to operate without failure, becomes more difficult. For this reason, research in software engineering has turned to formal methods, i.e. rigorous approaches to demonstrating the correctness of software systems. Unfortunately, the formal methods currently used for concurrent systems do no provide a mechanism for expressing and manipulating non-functional constraints formally. In this paper, we show that one class of non-functional constraints, the target architectures, can be expressed using formal notation (the UNITY proof logic). We then use a mixture of specifications and program refinements to derive a program which is demonstrably correct, both functionally and in its appropriateness for implementation on a specific machine.

... Read complete abstract on page 2.
Architecture-Directed Refinement

Complete Abstract:

As critical computer systems continue to grow in complexity, the task of demonstrating that they are correct, that is, guaranteed to operate without failure, becomes more difficult. For this reason, research in software engineering has turned to formal methods, i.e. rigorous approaches to demonstrating the correctness of software systems. Unfortunately, the formal methods currently used for concurrent systems do no provide a mechanism for expressing and manipulating non-functional constraints formally. In this paper, we show that one class of non-functional constraints, the target architectures, can be expressed using formal notation (the UNITY proof logic). We then use a mixture of specifications and program refinements to derive a program which is demonstrably correct, both functionally and in its appropriateness for implementation on a specific machine.

This technical report is available at Washington University Open Scholarship: https://openscholarship.wustl.edu/cse_research/597
Architecture-Directed Refinement

Gruia-Catalin Roman
C. Donald Wilcox

WUCS-92-34

October 1992

Abstract

As critical computer systems continue to grow in complexity, the task of demonstrating that they are correct, that is, guaranteed to operate without failure, becomes more difficult. For this reason, research in software engineering has turned to formal methods, i.e., rigorous approaches to demonstrating the correctness of software systems. Unfortunately, the formal methods currently used for concurrent systems do not provide a mechanism for expressing and manipulating non-functional constraints formally. In this paper, we show that one class of non-functional constraints, the target architecture, can be expressed using a formal notation (the UNITY proof logic). We then use a mixture of specification and program refinements to derive a program which is demonstrably correct, both functionally and in its appropriateness for implementation on a specific machine.
Correspondence: All communications regarding this paper should be addressed to

Dr. Gruia-Catalin Roman
Department of Computer Science
Washington University
Campus Box 1045
One Brookings Drive
Saint Louis, MO 63130-4899

office: (314) 935-6190
secretary: (314) 935-6160
fax: (314) 935-7302

roman@cs.wustl.edu
1. Introduction

Increasingly, computers are used to control and monitor critical systems where failures are unacceptable. In many such systems, it is necessary to provide strong guarantees that each software component will function correctly. Because these systems are large, complex, and involve multiple computers, mere testing is not sufficient for demonstrating that the software is free of errors. Consequently, research in software engineering has been forced to consider more rigorous approaches to the specification, analysis, and construction of programs. Program derivation is a promising formal approach to constructing correct programs in which a program is created by mathematically manipulating formal specifications of the problem to be solved. An initial, abstract specification is gradually refined until it becomes sufficiently concrete as to suggest a direct realization in terms of some available programming language. The correctness of the final program is guaranteed by its construction.

Initial research on program derivation dealt exclusively with sequential programs and relied upon the weakest-precondition calculus [10]. Two general program construction strategies emerged from this work: algorithm refinement [1, 9, 14, 19, 20, 26] which is concerned with procedural abstractions and data refinement [15, 18, 21] which is concerned with data representations. As interest shifted into the realm of concurrent systems, it became necessary to consider new program derivation strategies.

Although this field is relatively new, several promising approaches to deriving concurrent systems have surfaced in recent years. Back and Sere, in their work on action systems [2], advocate a program refinement approach in which a series of correctness-preserving transformations are applied for the purpose of changing an initially large-grained (possibly sequential) program into a fine-grained, highly-concurrent one. Similar program-transformation ideas have been explored by several other researchers [11, 12, 13, 17], mainly in the context of the CSP model. In contrast, Chandy and Misra, in their work on UNITY [6], build on the legacy of algorithm and data refinement, working largely within the realm of specifications, deferring the writing of a program until the very end of the refinement process.

The UNITY proof logic combines the expressive power of the linear-time logics with the conceptual simplicity of Hoare-style predicates over a program's state. An initial specification consisting of assertions about both
the safety and the progress properties of the desired program is gradually refined until the construction of a correct program becomes straightforward. Both the data and the algorithm are refined along the way. To date there has been a great deal of program derivation work within the UNITY framework [7, 16]; Staskauskas, 1988 #712.

Furthermore, the UNITY model has been extended by Roman and Cunningham [8, 23, 24] to accommodate increased levels of dynamism as well as additional programming paradigms, such as rule-based programming and dynamic synchrony. This extended model, Swarm, has been used to provide, for the first time, a formal framework for deriving parallel production systems [25]—it is worth noting that the resulting program derivation methodology combines elements of both specification and program refinement, an idea to which we will return later in this paper.

One significant shortcoming common to all of these approaches is that they exclude considerations regarding the target architecture from the formal framework while, at the same time, they make use of informal characterizations of the target architecture to justify individual refinements and to guide the derivation process toward an efficient solution. These architectural constraints have a significant impact on the direction taken by the refinement process; clearly, a proper solution to a given problem given a shared memory multiprocessor will be different than that for a message-passing network. However, because architectural constraints are stated informally, they cannot be factored (formally) into the derivation process and no one attempts to prove that the resulting program can actually be executed on the desired architecture.

While many designers may view the notion of formalizing architectural constraints with some skepticism—established formal derivation methods are yet to prove themselves in an industrial setting—the idea has important intellectual and practical implications. First, eventual tool support for design activities becomes possible only if they acquire a formal underpinning. Second, applications involving high degrees of reliability often require high performance levels achievable only through the utilization of specialized devices and novel architectures. The combination of reliability and performance is likely to render inadequate any ad-hoc reasoning about architectural constraints and to limit the applicability of current formal derivation methods.

The main contribution of this paper is to show that architectural constraints can be expressed formally using the same notational and logical system employed to specify behavior. Further, we show that the architectural constraints can be used to guide the process of constructing a program appropriate for the desired architecture. Our
approach starts with a formal specification of the program behavior, written using safety and liveness assertions over an abstract (global) state of the program. This specification is refined to produce an initial program which, while correct, does not necessarily map well onto the desired architecture. At this point formal specifications of the constraints imposed by the target architecture are added to the specification and used to guide a program refinement process leading to an architecture-specific program.

In the remainder of this paper, we elaborate on this methodology for deriving concurrent systems, with an example taken from distributed simulation. Section 2 outlines the design methodology in greater detail. Section 3 describes the Swarm notational system employed in this paper. In section 4, the example problem is described, and an initial, high-level specification is given. Section 5 traces the refinement of the specification to an initial abstract program. Section 6 illustrates the process of adding architectural constraints to the specification. For this purpose, two example architectures are selected: (1) a bus-based, message-passing architecture with specialized hardware for sharing information, and (2) a simple, unidirectional ring. Finally, sections 7 and 8 draw some conclusions.

2. Design Methodology

Our approach to the specification and design of concurrent systems is unique in its integration of architectural issues into the derivation process. The methodology is two-phased. Specification refinement is employed to construct a first program which is correct but architecture-independent. The specification method, refinement strategy, and notation are essentially those of UNITY. Program refinement is used to transform this program into another which satisfies the behavior specification as well as formally stated architectural constraints ignored during the specification refinement phase. The compatibility between the program and the target architecture is guaranteed by construction. Despite superficial similarities with techniques involving changes in the atomicity of program actions, the program refinement process presented here is new. The program transformations are not part of a standard, restricted repertoire. They are creative steps motivated by violations of the architectural constraints and may trigger limited re-verification of certain behavioral assertions. The amount of re-verification is kept small by carefully monitoring the relation between program actions and the assertions whose validity they may impact. A technique for formal specification and verification of architectural constraints is an integral part of this approach.
**Initial specification.** Like in UNITY, a program specification consists of safety and progress properties of the desired program. These safety assertions constrain the range of possible state transitions in which the program may engage; the progress assertions define state transitions that are required to take place. The specification is concerned only with the program behavior and makes no references to any non-functional constraints the program may have to meet (e.g., response time, reliability, cost, etc.). Furthermore, all assertions are stated in terms of a highly abstract state representation of the program. This is accomplished by substituting references to concrete data representations by predicates whose truth values are properly constrained and whose interpretation is given informally. For example, we can express the notion that some action \( \alpha \) is executed at time \( t \) at node \( P \) by the predicate

\[
\text{action}(P, t, \alpha).
\]

Safety properties are specified using the unless relation as in

\[
p \text{ unless } q
\]

which asserts that if the program reaches a state in which the predicate \( p \) holds, \( p \) will continue to hold at least as long as \( q \) does not, which may be forever. Given the unless relation, one can easily introduce the notion of stability

\[
\text{stable } p = (p \text{ unless } false)
\]

which states that once \( p \) holds, it will continue to hold forever; and the concept of invariant

\[
\text{inv. } p = (\text{INIT } \Rightarrow p \land \text{stable } p)
\]

which asserts that \( p \) holds initially and throughout the execution of the program. \( \text{INIT} \) characterizes the initial state of the program. Most often, the program initialization is not given explicitly but it is implied by the invariant properties.

Progress properties are specified using the ensures and leads-to (written "\( \rightarrow \)") relations. The assertion


\( p \text{ ensures } q \)

requires that if the program reaches a state in which \( p \) holds, there is one specific action of the program which will be executed and will establish \( q \) while all other program actions either preserve the validity of \( p \) or are free to establish \( q \). The assertion

\[ p \Rightarrow q \]

simply states that if the program reaches a state in which \( p \) holds, it will eventually reach a state in which \( q \) holds. Unlike in the \textit{ensures} relation, \( p \) is not required to hold until \( q \) is established, nor must there be one specific statement which establishes \( q \). The \textit{until} relation, defined as

\[ p \text{ until } q = (p \Rightarrow q) \land (p \text{ unless } q) \]

is used to describe progress when \( p \) is required to hold until \( q \) is established, but no one specific statement is guaranteed to establish \( q \) atomically.

Architecture-independent specification refinement. Typical specifications tend to include many safety properties and relatively few progress properties. The former place constraints on the solution space and, for reasons of completeness, are rather detailed and numerous. The latter require that particular goal states must be reached but leave the details of how this is to be accomplished undefined. The purpose of specification refinement is to add sufficient detail on how progress is to be accomplished so as to make the writing of an appropriate correct program a trivial exercise. This means that relatively broad progress properties must be replaced by increasingly more specific ones. Changes in state representation often accompany these refinements and lead to more detailed formulations of the safety properties. Coupling invariants serve to relate predicates given in terms of one state representation to those given in terms of another, more concrete, representation. When the state representation and the progress properties match directly to data structures and statements in the target language, respectively, the transition from a specification to a program takes place.

Each specification refinement is a creative step motivated by design insights and carried out in a highly disciplined fashion. Although the syntactic form of a particular assertion may suggest a certain type of refinement,
such heuristics—which could ultimately lead to some form of automation—play only a secondary role in the refinement process today. In principle, the specification refinement could be biased towards a very specific architecture at the expense of rendering all other architectures inappropriate. In our methodology, however, the emphasis is on a general, architecture-independent solution. While this approach could be taken even in the absence of formal architectural constraints, it is rather the case that methodologies which treat architectural constraints informally need the target architecture to become the principal motivating factor behind the specification refinement, making an architecture-independent solution infeasible. As a result, each new architecture one might consider requires a new specification refinement. By attempting to generate first an architecture-independent program, our methodology makes it possible to perform the specification refinement once and then to use the same initial program as the basis for deriving multiple architecture-specific programs.

**Architectural specification.** It is a generally accepted fact that knowledge about the underlying architecture is required in order to construct an efficient or reliable program. It is also true that for applications that involve specialized hardware or network topologies, understanding the architecture is needed simply to write a program that can be executed, regardless of its performance. We define an architectural constraint to be any property a program must satisfy in order to execute on a given architecture. The absence of global data is one such constraint characterizing a distributed network. Each constraint defines a class of acceptable programs. An architectural specification consists of a number of constraints which must be satisfied simultaneously.

Formal specification of architectural constraints—not to be confused with hardware specification languages—is made difficult by the fact that one must define an entire class of programs having unknown behaviors. The task is further complicated by our desire to specify architectural constraints in the same assertional style and notation being employed in dealing with behavior specifications—simply called program specifications. The use of a single common notation for both program specifications and architectural constraints is motivated by our ultimate goal of integrating architectural considerations in the program derivation process.

The specification technique we describe in this paper relies on the introduction of auxiliary variables whose purpose is to capture certain salient relationships between a program and the architecture on which it is executed.
Architectural constraints are specified as assertions involving the auxiliary variables. The absence of globally shared data could be stated as

\[ \text{inv. } \text{access}(i, k) \Rightarrow i = k \]

where the predicate \( \text{access}(i, k) \) is to be interpreted to mean a statement on processor \( i \) accessed data on processor \( k \). Such a fact may be stated without having the program at hand. Once the program is known, however, its statements may be augmented with auxiliary variables in accordance with a pre-defined set of rules and the invariant above can be in fact proven to hold for the particular program. The augmentation rules can be defined formally but those we use in this paper are simple enough to be described informally without undermining the rigorous nature of the presentation.

**Architecture-driven program refinement.** Because of its generality, the program generated at the conclusion of the specification refinement is unlikely to satisfy the architectural constraints that may be involved in the problem at hand. Detected violations of the architectural constraints are then used to guide the program refinement. In the simplest terms, our approach requires one to keep a checklist of constraints which are not satisfied by the program in its current form, to pick one from the list and fix it, updating the list accordingly. The process is complete when all the constraints are met. Because the entire process is performed within the formal framework, with proofs when necessary, the final program is known to be both correct and implementable on the target architecture.

Program refinements may involve changes in data representation, in the granularity of program statements, and in the allocation of statements and data to architectural components. Each refinement may trigger re-verification against both the program specification and the architectural constraints. In practice, modular design and careful staging of the program refinements can limit to a significant degree the extent of such re-verification.

3. Notation

In this paper, we will be using Swarm [23], an extension to UNITY to reason about and write programs. Swarm belongs to a class of languages and models that use tuple-based communication. Other languages and models in this class are Linda [4], Associons [22], and GAMMA [3]. Two features of Swarm that are of particular impor-
tance for this paper are (1) its UNITY-like proof logic, and (2) the ease with which it can accommodate a variety of programming paradigms (e.g., shared variables, message passing, rule-based) and architectures (e.g., synchronous, asynchronous, reconfigurable, etc.). This section briefly describes the Swarm notation and proof logic. For a more complete discussion, the reader is referred to [8, 23, 24].

3.1. The Swarm Programming Notation

The primary means for communication among the concurrent components of a Swarm program is a common, content-addressable data structure called a shared dataspace. Elements of the dataspace may be examined, inserted, or deleted by programs. The model partitions the dataspace into three subsets: TPS, the tuple space (a finite set of data tuples), TRS, the transaction space (a finite set of transactions), and SC, the synchrony relation (a symmetric relation on the set of all possible transactions). A Swarm transaction denotes an atomic transformation of the dataspace. Instances of transactions may be created dynamically by an executing program.

A Swarm program begins execution from a specified initial dataspace. On each execution step, a transaction is chosen nondeterministically from the transaction space and executed atomically. This selection is fair in the sense that every transaction in the transaction space will eventually be chosen. An executing transaction examines the dataspace and then, depending upon the results of the examination, can delete tuples (but not transactions) from the dataspace and insert new tuples and transactions into the dataspace. Unless a transaction explicitly reinserts itself into the dataspace, it is deleted as a by-product of its execution. Program execution continues until there are no transactions remaining in the dataspace.

The synchrony relation is a relation over the set of possible transaction instances. This relation may be examined and modified by programs in the same manner as the tuple and transaction spaces. The synchrony relation affects program execution as follows: whenever a transaction is chosen for execution, all transactions in the transaction space which are related to the chosen transaction by (the transitive closure of) the synchrony relation are also chosen; all of the transactions that make up this set, called a synchronic group, are executed as if they comprised a single transaction.
Next we illustrate the Swarm programming notation using a toy example, a program in which at most N timers are incremented in lockstep fashion. Each timer i is incremented modulo some overflow value ovr(i). A timer may be brought on line any time during the computation, but eventually all timers mark time in step with each other. To accomplish this, all timers are reset to zero whenever any one of them is reset to zero. As a result, all active timers count modulo \( m = \{ \min i : 1 \leq i \leq N \land \text{active}(i) :: \text{ovr}(i) \} \).

**Construct a timer.** We begin by first considering the case of a simple timer with identifier i. We can represent the current state of this timer as a tuple \( \text{time}(i, v) \), where \( v \) is the timer's current value. The transaction which increments and resets the timer is \( \text{Timer}(i, \text{ovr}(i)) \). A timer is activated by inserting in the dataspace (initially or during the computation) the corresponding data tuple and transaction.

**Define the timer’s behavior.** A transaction stored in the dataspace is simply a name for an atomic transformation of the dataspace. The transaction’s behavior is defined separately as the composition of one or more subtransactions. A subtransaction consists of a dataspace query, which binds some set of existentially quantified local variables whose scope extends over the entire subtransaction, followed by an action which modifies the contents of the dataspace by inserting or removing entries if the query succeeds. (Notationally, the query and action are separated by “\( \rightarrow \)”, we use the comma for logical and \( (\land) \), and sub-transactions are separated by “\( || \)”). By definition, deletions are performed before insertions. The query can be any arbitrary predicate over the dataspace, similar to a Prolog goal, and may check for the presence (or absence) of specific entities in the dataspace. The semantics of transaction execution are similar to those for a single subtransaction, except that the queries for all subtransactions are evaluated in parallel, followed by the deletions and then the insertions appearing in the actions of those sub-transactions whose queries succeeded.

In our example, we can specify the behavior of an individual timer by introducing the following transaction type definition (the reason for dividing the first two transactions will be made clear later):

\[
\text{Timer}(\text{id}, \text{MAX}) = \\
\quad t : \text{time}(\text{id}, t), t \geq \text{MAX} \rightarrow \text{skip} \\
\quad || t : \text{OR}, \text{time}(\text{id}, t) \rightarrow \text{time}(\text{id}, t'), \text{time}(\text{id}, 0) \\
\quad || t : \text{NOR}, \text{time}(\text{id}, t) \rightarrow \text{time}(\text{id}, t'), \text{time}(\text{id}, t+1) \\
\quad || \text{TRUE} \rightarrow \text{Timer}(\text{id}, \text{MAX})
\]
The first subtransaction consists of a regular query (a query which does not use any special predicates) which checks whether or not the timer needs to be reset and has a null action (skip). The variable \( t \), which is local to this subtransaction, is bound by finding in the dataspaces a tuple of type \( time \) whose first component contains the constant \( id \). The success of the first subtransaction is communicated to the second subtransaction via the special predicate OR, which succeeds whenever any regular query executed in parallel evaluates to true. As a result, the second subtransaction resets the timer by deleting the tuple \( time(id, t) \), independently found by its own query, and by inserting the tuple \( time(id, 0) \). Similarly, the third subtransaction uses NOR (i.e., not OR) to determine if the timer can be incremented by one unit. Finally, the fourth subtransaction recreates the timer (which otherwise would be implicitly deleted). The special predicate TRUE (which always succeeds) ensures that the query associated with this subtransaction becomes a special query and is therefore not considered when OR and NOR are evaluated.

**Establish lockstep execution.** The requirement for lockstep execution can be expressed in Swarm using the third type of dataspaces entity, the *synchrony relation*. Two timers \( i \) and \( j \) can be made part of the same synchronous group by inserting into the dataspaces the following synchrony relation entry:

\[
\text{Timer}(i,\text{ovr}(j)) \rightarrow \text{Timer}(j,\text{ovr}(j))
\]

Recall that a set of transactions present in the dataspaces and closed under the reflexive transitive closure of the synchrony relation is called a *synchronous group*, and that whenever a transaction is selected for execution, the entire synchronous group to which it belongs is executed, and all the subtransactions for all transactions in the group are executed together as if they were part of a single larger transaction. An interesting consequence of these semantics is that the special predicates are now evaluated with respect to the regular queries of the entire synchronous group—we had this in mind when we decided to use special queries in the definition of Timer above. Consequently, the special predicate OR evaluates to true whenever the query of the first subtransaction in either Timer\( (i,\text{ovr}(i)) \) or Timer\( (j,\text{ovr}(j)) \) succeeds, indicating that both timers must be reset.

### 3.2. The Swarm Proof Logic

Properties in the Swarm logic are expressed in terms of predicates over the global system state. Safety properties are given in terms of unless, and progress in terms of ensures (for atomic transitions), and \( \rightarrow \) (read...
leads-to) for transitions of a larger grain. Other concepts, such as invariance, can be expressed in terms of these basic properties. Figure 1 summarizes the properties which we will use in this paper. The main difference between the UNITY and Swarm logics is the addition of dynamic control constructs; whereas in UNITY the set of statements in a program is fixed, the transactions in a Swarm program are drawn from a (possibly) infinite set, and the collection of transactions currently instantiated can changes during program execution. Similarly, the synchrony relation provides for dynamically changing the size of the atomic actions within a program by changing the form of the relation. These differences show up only in the formal execution model and in the definitions of the basic relations (unless, ensures, and leads-to), the theorems proven within the UNITY model continue to hold for Swarm. It should be clear to the reader that the Swarm inference rules reduce to those of UNITY when the synchrony relation is empty and the set of instantiated transactions remains constant.
1. $\{ p \} s \{ q \}$
Given predicates $p$ and $q$ and a synchronic group $s$, this assertion (also called a “Hoare triple”) holds if in every state satisfying the precondition $p$, the execution of $s$ results in a state satisfying the postcondition $q$.

2. $p$ unless $q$
If $p$ is true at some point in the computation and $q$ is not, then executing any synchronic group either maintains $p$ or establishes $q$, i.e.,
$$\{ p \land \neg q \} s \{ p \lor q \}$$
unless constrains the valid set of state transitions within a program.

3. inv. $p = (INIT \Rightarrow p) \land (p$ unless false$)$
The property $p$ is true at all points in the computation, i.e., invariant.

4. $p$ ensures $q = p$ unless $q \land (\exists s: s \in SG :: \{ p \land \neg q \} s \{ p \lor q \})$
If $p \land \neg q$ is true, there exists a transaction $t$ such that every synchronic group $s$ containing $t$ will establish $q$ when executed. The fairness assumption guarantees that $s$ will eventually be selected.

5. $p \Rightarrow q$
This, read $p$ leads to $q$, means that once $p$ becomes true, $q$ will eventually become true, but $p$ is not guaranteed to remain true until $q$ becomes true. Note that $\Rightarrow$ is transitive, whereas ensures is not.

6. $p$ until $q = (p \Rightarrow q) \land (p$ unless $q$)
$p$ until $q$ is a special case of the leads-to relation, which requires that $p$ continue to hold as long as $q$ does not hold. Unlike ensures, there is no requirement that the transition from $p$ to $q$ take place in one atomic step.

7. $p$ detects $q = (p \Rightarrow q) \land (q \Rightarrow p)$
Property $p$ can be used to determine that some, typically more complex, property $q$ has been satisfied.

Figure 1: Notation used in the Swarm proof logic.

4. Initial Specification

To illustrate our methodology, we will be using an example that was inspired by previous work on distributed simulation [5]. Consider a network of sequential nodes which can exchange messages over communication links. Each node executes a sequence of actions. Each action consists of the retrieval of pending messages from other nodes, the updating of some local data and the sending of messages to other nodes over the links. We assume that the links are error-free. We wish to specify and design a distributed program which will simulate this system.

To avoid any confusion, we will limit the use of the terms “network” and “node” to refer to the simulated system, and reserve the term “program” for the simulator.
4.1. State Representation

For specification purposes, it is convenient to assume an absolute global time. The predicate \( gclock(T) \) is used to denote the fact that the current time is \( T \). The network's state can be characterized in terms of the actions which are to be executed, the messages which are to be delivered, and the local state of each node. The following predicates are used to represent this information:

\[
\begin{align*}
\text{state}(P, \sigma) & \quad \text{The current data state of node } P \text{ is } \sigma. \\
\text{action}(P, \tau, \alpha) & \quad \text{Action } \alpha \text{ will be executed at time } \tau \text{ at node } P. \text{ We define the special action } \bot \text{ to be the } \text{\textit{halt}} \text{ action, i.e., when } \text{\textit{action}}(P, \tau, \bot) \text{ is true, processor } P \text{ is terminated.} \\
\text{message}(P, Q, \tau, \mu) & \quad \text{A message from } P \text{ to } Q \text{ with content } \mu \text{ will be delivered at time } \tau.
\end{align*}
\]

The predicates introduced so far define an abstract state representation for the program. Since not all states are acceptable, several invariants are introduced which serve to constrain the state space in a reasonable way (all free variables are assumed to be universally quantified):

\[
\begin{align*}
\text{F1:} & \quad \text{There is exactly one gclock value} \\
& \quad \text{inv. } (\Sigma T : gclock(T) :: 1) = 1 \\
\text{F2:} & \quad \text{Each process has a unique state} \\
& \quad \text{inv. } (\Sigma \sigma : \text{state}(P, \sigma) :: 1) = 1 \\
\text{F3:} & \quad \text{Each node executes one action at a time} \\
& \quad \text{inv. } (\Sigma \tau, \alpha : \text{action}(P, \tau, \alpha) :: 1) = 1 \\
\text{F4:} & \quad \text{Actions are never "in the past"} \\
& \quad \text{inv. } \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \land gclock(T) \Rightarrow T \leq \tau \\
\text{F5:} & \quad \text{Message values are unique across the network} \\
& \quad \text{inv. } \text{message}(P, Q, \tau, \mu) \land \text{message}(P', Q', \tau', \mu') \Rightarrow ((P, Q, \tau) = (P', Q', \tau') \Rightarrow \mu = \mu') \\
\text{F6:} & \quad \text{Messages are never "in the past"} \\
& \quad \text{inv. } \text{message}(P, Q, \tau, \mu) \land gclock(T) \Rightarrow T \leq \tau
\end{align*}
\]

An alternate, but equivalent, reformulation of F4 and F6 is useful:

\[
\begin{align*}
\text{F4':} & \quad \text{inv. } \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \land gclock(T) \Rightarrow T \leq (\min P', \tau', \alpha' : \text{action}(P', \tau', \alpha') \land \alpha' \neq \bot :: \tau') \\
\text{F6':} & \quad \text{inv. } \text{message}(P, Q, \tau, \alpha) \land gclock(T) \Rightarrow T \leq (\min P', Q', \tau', \alpha' : \text{message}(P', Q', \tau', \alpha') :: \tau')
\end{align*}
\]
We say that an action is enabled when the system clock has reached the action activation time, and there are no messages to be delivered at the current time. A processor is halted if its action is \( \perp \). Formally,

\[
\begin{align*}
\text{enabled}(P, \tau, \alpha) &= \text{action}(P, \tau, \alpha) \land \text{gclock}(\tau) \land (\forall Q, \mu :: \neg \text{message}(Q, P, \tau, \mu)) \land \alpha \neq \perp \\
\text{halted}(P) &= (\exists \tau :: \text{action}(P, \tau, \perp))
\end{align*}
\]

Since many details of the network's computation are not relevant to the simulation program, we encapsulate them using several functions assumed to be available to the program:

\[
\begin{align*}
\text{u}(P, \sigma, \mu) &\quad \text{returns the state of node } P \text{ to when the absorption of a message } \mu \text{ is absorbed in state } \sigma. \\
\text{e}(P, \tau, \sigma, \alpha) &\quad \text{returns the time when the action immediately following } \alpha \text{ will be executed given that } \alpha \text{ is executed on node } P \text{ at time } \tau \text{ in state } \sigma. \text{ The function } e \text{ is strictly monotonic with respect to the argument } \tau. \\
\text{a}(P, \sigma, \alpha) &\quad \text{returns the name of the action which will be executed at node } P \text{ subsequent to the completion of the action } \alpha \text{ in state } \sigma. \\
\text{c}(P, Q, \sigma, \alpha) &\quad \text{is true if node } P \text{ sends a message to node } Q \text{ as a result of executing the action } \alpha \text{ in state } \sigma. \\
\text{l}(P, Q, \tau, \sigma, \alpha) &\quad \text{returns the delivery time for a message sent by node } P \text{ to node } Q \text{ as a result of executing action } \alpha \text{ in state } \sigma. \text{ Because a message is sent at the completion of the initiating action, } l(P, Q, \tau, \sigma, \alpha) \text{ must exceed } e(P, \tau, \sigma, \alpha). \\
\text{v}(P, Q, \sigma, \alpha) &\quad \text{returns the contents of the message sent to node } Q \text{ by node } P \text{ when executing the action } \alpha \text{ in state } \sigma. \\
\text{s}(P, \sigma, \alpha) &\quad \text{returns the new state of node } P \text{ resulting from executing the action } \alpha \text{ in state } \sigma.
\end{align*}
\]

We now turn our attention to the problem of describing the assumptions we make about the behavior exhibited by the network to be simulated. As a means of organizing ourselves, we discuss valid changes to the different components of the network state separately. Properties involving several state components are discussed when first encountered.
4.2. Time

In the real world, time advances one unit at a time. If we impose this constraint in our specification, we will make it impossible for the program to advance time in larger increments whenever there is no network activity at any node. To avoid this unnecessarily strong condition, we will simply restrict time from moving backwards, leaving the increment unspecified:

F7: \( \text{gclock}(\tau) \text{ unless } (\exists \tau': \tau' \geq \tau + 1 : \text{gclock}(\tau')) \)

This formulation also allows the movement of time to cease (because it is a safety property and not a liveness property) when the simulation is "finished."

4.3. Messages

Upon being created as a result of executing an action, a message travels through the network in some unspecified manner until its delivery time (F9). Messages are delivered in arbitrary order (we consider a message delivered when it is deleted by the receiving processor), and the delivery of a message results in an atomic update of the state of the recipient (F10). Messages can be created only by executing an action (F8, F11).

F8: \( \text{Messages are only created by executing actions} \)
\( \text{action}(P,\tau,\alpha) \land (\text{set } Q,\tau',\mu : \text{message}(P,Q,\tau',\mu) :: \mu) = M \)
\( \text{unless} \)
\( \neg \text{action}(P,\tau,\alpha) \lor (\text{set } Q,\tau',\mu : \text{message}(P,Q,\tau',\mu) :: \mu) \subset M \)

F9: \( \text{Messages exist until their delivery time} \)
\( \text{message}(P,Q,\tau,\mu) \text{ unless } \text{message}(P,Q,\tau,\mu) \land \text{gclock}(\tau) \)

F10: \( \text{Messages are absorbed at the time of their delivery} \)
\( \text{message}(P,Q,\tau,\mu) \land \text{gclock}(\tau) \text{ unless } \neg \text{message}(P,Q,\tau,\mu) \land \text{gclock}(\tau) \)

F11: \( \text{Messages are not created at the current time} \)
\( \neg \text{message}(P,Q,\tau,\mu) \land \text{gclock}(\tau) \text{ unless } \neg \text{gclock}(\tau) \)

F12: \( \text{Messages exist until incorporated into the state of the receiving node} \)
\( \text{state}(P,\sigma) \land \text{gclock}(\tau) \land (\text{set } Q,\mu : \text{message}(Q,P,\tau,\mu) :: \mu) = M \land M \neq {} \)
\( \text{unless} \)
\( (\exists \mu' : \text{message}(Q,P,\tau,\mu') :: \mu') = M - {\mu} :: \text{state}(P,M(P,\sigma)) \)

4.4. Actions

Although it is simpler to think of actions as being atomic and instantaneous, they in fact have a duration which should be modeled in the simulation. A simple way to capture this is to disallow the subsequent action from
executing until the first is completed. This allows us to view an action as waiting for its execution time (F13, F14). We require terminated nodes to remain so forever (F16) — and assume that messages whose delivery time coincides with the execution time of an action are absorbed before the action executes.

F13: An action continues to exist until it is time to execute it
action(P, τ, α) ∧ α ≠ ⊥ unless action(P, τ, α) ∧ α ≠ ⊥ ∧ gclock(τ)

F14: An action continues to exist until it is enabled
action(P, τ, α) ∧ α ≠ ⊥ ∧ gclock(τ)
   unless
   action(P, τ, α) ∧ α ≠ ⊥ ∧ gclock(τ) ∧ \( \forall Q, μ :: ¬message(Q, P, τ, μ) \)

F15: An enabled action continues to exist until executed atomically
state(P, σ) ∧ enabled(P, τ, α)
   unless
   action(P, c(P, σ, α), a(P, σ, α)) ∧
   state(P, s(P, σ, α)) ∧
   \( \forall Q : c(P, Q, σ, α) :: message(P, Q, I(P, Q, τ, σ, α), v(P, Q, σ, α)) \)

F16: A terminated node remains terminated
stable halt(P)

4.5. State

The only events which can change the state of a processor are the delivery of a message or the execution of an action. Note that these transitions have already been described (F12 and F15). All that remains is to forbid any other changes to the state.

F17: The state does not change in the absence of work to do
state(P, σ) unless state(P, σ) ∧ \( \exists Q, τ, α, μ : gclock(τ) :: enabled(P, τ, α) ∨ message(Q, P, τ, μ) \)

4.6. Progress

The final section of the initial specification contains the liveness properties which describe the state transitions which are required by the system. Only two progress properties are needed to describe the required state transitions, messages must be delivered, and actions must be executed:

F18: Messages must be delivered
message(P, Q, τ, μ) \( \rightarrow \) ¬message(P, Q, τ, μ)

F19: Actions must be executed
action(P, τ, α) ∧ α ≠ ⊥ \( \rightarrow \) ¬action(P, τ, α)
Other aspects of the network's behavior (such as moving time forward) are implied by combining these progress properties with the earlier stated safety properties. For example, if an action has an execution time in the future, then the clock must eventually move forward by properties F13 and F19.

This completes the specification of the network behavior. By carefully abstracting away irrelevant details, we are able to generate a specification which is both concise and clear.

5. Specification Refinement

In this section we refine the behavior specification up to the point where program generation becomes trivial. No architectural constraints are considered at this point. The refinement process is guided by the need to discover a series of simple state transitions that realize the progress conditions. More specifically, most of the refinements are involved with generating a precise specification of when and how the simulation clock should be incremented. The steps used here are not generic but specific to the problem. We view program derivation as a creative step, and not a mechanical substitute for design.

We use the notation

\[ p \vdash q \]

to mean "q can be proved from the specification which results from replace q with p."

5.1. Add specificity to the processing of messages and actions.

The first refinement is suggested by properties F9–F10 and F13–F15, which characterize the life-cycle of messages and actions into a series of state transitions. By combining this information with the progress properties F18 and F19, it is clear that (1) messages must be held until the time of delivery and no further; and (2) actions must be held until all messages having delivery time equal to the start of the action have been processed, and the effect of the action must be atomic.

Refinement. F18 is replaced by

F18.1: \[ \text{message}(P, Q, \tau, \mu) \rightarrow \text{message}(P, Q, \tau, \mu) \land \text{gclock}(\tau) \]

F18.2: \[ \text{message}(P, Q, \tau, \mu) \land \text{gclock}(\tau) \rightarrow \neg \text{message}(P, Q, \tau, \mu) \land \text{gclock}(\tau) \]
which are analogous to F9 and F10; F19 is replaced by

F19.1: \[ \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \iff \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \land \text{gclock}(\tau) \]

F19.2: \[ \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \land \text{gclock}(\tau) \]
\[ \iff \]
\[ \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \land \text{gclock}(\tau) \land \left( \forall Q, \mu : \neg \text{message}(Q, P, \tau, \mu) \right) \]

F19.3: \[ \text{enabled}(P, \tau, \alpha) \land \text{state}(P, \sigma) \]
\[ \iff \]
\[ \text{action}(P, e(P, \sigma, \alpha), a(P, \sigma, \alpha)) \land \]
\[ \text{state}(P, s(P, \sigma, \alpha)) \land \]
\[ \left( \forall Q : \text{c}(P, Q, \sigma, \alpha) : \text{message}(P, Q, I(P, Q, \tau, \sigma, \alpha), v(P, Q, \sigma, \alpha)) \right) \]

which are analogous to F13-F15.

**Proof Obligations.** We are required to show:

F18.1, F18.2 ├ F18

F19.1 \land F19.2 \land F19.3 ├ F19

The proof follows from the transitivity of the leads-to relation.

**Refined Specification.**

*Safety:* F1-F17


5.2. Decouple time movement from message and action processing.

The movement of time is required only by the presence of messages to be delivered and actions to be executed. Properties F18.1 and F19.1 indicate this, but they couple the movement of time with the continued presence of messages or actions. These two concerns may be decoupled very easily in light of the fact that F9 and F13 are still part of this specification.

**Refinement.** F18.1 and F19.1 are replaced by:

F18.1.1: \[ \text{message}(P, Q, \tau, \mu) \rightarrow \text{gclock}(\tau) \]

F19.1.1: \[ \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \rightarrow \text{gclock}(\tau) \]
**Proof Obligations.**

F18.1.1 ⊢ F18.1
F19.1.1 ⊢ F19.1

The proof is immediate from the application of the progress-safety-progress (PSP) rule [6].

**Refined Specification.**

*Safety:* F1-F17


5.3. Place lower bound on time increment interval

There are two constraints on the mechanism by which simulation time is moved forward, and neither is reflected in F18.1.1 or F19.1.1. First, time must move forward in steps of at least one unit (by F7), and it cannot move beyond the earliest time in which a message exists to be delivered (by F4), or an action to be executed (by F6). The refinement is in two steps: in the first, we told the lower bound into the progress properties. The second refinement, which adds the upper bound, is deferred until later, when the proof of its correctness is easier.

**Refinement.** Properties F18.1.1 and F19.1.1 are replaced by:

F18.1.1.1: message(P, Q, τ, μ) ∧ gclock(T) ∧ T < τ ⊢ ( ∃ T': T' ≥ T + 1 :: gclock(T') )

F19.1.1.1: action(P, τ, a) ∧ a ≠ ⊥ ∧ gclock(T) ∧ T < τ ⊢ ( ∃ T': T' ≥ T + 1 :: gclock(T') )

**Proof Obligations.**

F18.1.1.1 ⊢ F18.1.1

**Proof outline.** We can prove the following lemma using PSP with F9 and F18.1.1.1.

L1: message(P, Q, τ, μ) ∧ gclock(T) ∧ T < τ
   ⊢ message(P, Q, τ, μ) ∧ ( (∀ T': T' ≥ T + 1 :: gclock(T') ) ∨ gclock(τ) )

Using invariant F6 with L1, we can prove L2:
L2: \[\text{message}(P, Q, \tau, \mu) \land \text{gclock}(T) \land T < \tau \rightarrow \text{message}(P, Q, \tau, \mu) \land (\exists \ T' : \tau \geq T' \geq T+1 :: \text{gclock}(T')) \lor \text{gclock}(\tau)\]

This is precisely the left-hand side of the induction principle for leads-to [6], where the well-founded metric is the difference between the delivery time of a message and the current gclock. Thus, we have:

L3: \[\text{message}(P, Q, \tau, \mu) \land \text{gclock}(T) \land T < \tau \rightarrow \text{message}(P, Q, \tau, \mu) \land \text{gclock}(\tau)\]

and

\[\text{message}(P, Q, \tau, \mu) \land \text{gclock}(T) \land T = \tau \Rightarrow \text{gclock}(\tau)\]

from which we conclude F18.1.1. The proof for F19.1.1 is similar.

\[\square\]

**Refined Specification.**

**Safety:** F1-F17

**Progress:** F18.1.1.1, F18.2, F19.1.1.1, F19.2, F19.3

5.4. **Unify clock control**

The similarity in the forms of properties F18.1.1.1 and F19.1.1.1 (and F4 and F6), suggests that messages and actions play identical roles with respect to clock movement. This leads us to consider a refinement in which each pair of properties is replaced by a single property which replaces them. The first refinement rewrites F18.1.1.1 and F19.1.1.1, the next handles F4 and F6.

**Refinement 1.** F18.1.1.1 and F19.1.1.1 are replaced by

F20: \[\begin{align*}
\text{gclock}(T) \land (\text{message}(P, Q, \tau, \mu) \lor (\text{action}(P, \tau, \alpha) \land \alpha \neq \bot)) \land \tau > T \\
\rightarrow \langle \exists T' : T' \geq T+1 :: \text{gclock}(T') \rangle
\end{align*}\]

which is simply the disjunction of F18.1.1.1 and F19.1.1.1.

**Proof Obligations.**

F20 \[\vdash\] F18.1.1.1
F20 ⊢ F19.1.1

**Proof Outline.** To show that F20 implies F18.1.1.1, we must prove that the left-hand side of F18.1.1.1 implies the left-hand side of F20, and that the right-hand side of F20 implies the right-hand side of F18.1.1.1, both of which are obvious by inspection. The second proof is the same.

**Refinement 2.** F4 and F6 (or F4' and F6') are replaced by

\begin{align*}
F21: & \quad \text{inv. } ((\text{action}(P,\tau,\alpha) \land \alpha \neq \bot) \lor \text{message}(P,Q,\tau,\mu)) \land \text{gclock}(T) \Rightarrow T \leq \tau \\
F21': & \quad \text{inv. } ((\text{action}(P,\tau,\alpha) \land \alpha \neq \bot) \lor \text{message}(P,Q,\tau,\mu)) \land \text{gclock}(T) \\
 & \quad \Rightarrow T \leq \langle \min P,Q',\tau',\alpha',\mu' : (\text{action}(P,\tau,\alpha) \land \alpha \neq \bot) \lor \text{message}(P,Q,\tau,\mu) :: \tau \rangle
\end{align*}

**Proof Obligations.**

F21 ⊢ F4

F21 ⊢ F6

The proof for the refinement of F4 and F6 is immediate.

**Refined Specification.**

**Safety:** F1-F3, F5, F7-F17, F21

**Progress:** F18.2, F19.2, F19.3, F20

5.5. **Refine mechanism for message delivery**

Property F18.2 requires only that messages be delivered before time moves forward, but does not provide any hints into the delivery mechanism. Property F12 provides the necessary insight, in that it requires that messages be delivered and absorbed in a single atomic step. This suggests the obvious refinement of replacing F18.2 with a progress property having the same form as F12.

**Refinement.** F18.2 is replaced with

\begin{align*}
\text{F18.2.1: } & \quad \text{gclock}(\tau) \land \{ \text{set } Q,\mu' : \text{message}(Q,P,\tau,\mu') :: \mu' \} = M \land M \neq \{ \} \land \text{state}(P,\alpha) \\
 & \quad \Rightarrow \langle \exists \mu : \{ \text{set } Q,\mu' : \text{message}(Q,P,\tau,\mu') :: \mu' \} = M-[\{\mu\} :: \text{state}(P,\mu(P,\sigma,\mu))] \rangle
\end{align*}
Proof Obligations.

F18.2.1 ⊢ F18.2.

Proof Outline. The proof makes use of the induction principle for leads-to, where the well-founded set is the set of messages which remain to be delivered at the current time. The induction principle requires that we prove the following lemma, which implies F18.2:

L4: \[ \text{gclock}(\tau) \land \text{message}(P,Q,\tau,\mu) \land \langle \text{set } P',Q',\mu' : \text{message}(P',Q',\tau,\mu') \land \mu \neq \mu' :: \mu' \rangle = M \]
\[ \rightarrow \]
\[ (\text{gclock}(\tau) \land \text{message}(P,Q,\tau,\mu) \land \langle \text{set } P',Q',\mu' : \text{message}(P',Q',\tau,\mu') \land \mu \neq \mu' :: \mu' \rangle \subset M) \lor \]
\[ (\text{gclock}(\tau) \land \neg \text{message}(P,Q,\tau,\mu)) \]

By applying PSP to F18.2.1 and F10, we can conclude the following:

\[ \text{gclock}(\tau) \land \langle \text{set } Q',\mu' : \text{message}(Q',P,\tau,\mu') :: \mu' \rangle = M \land M \neq \{ \} \land \text{state}(P,\sigma) \land \text{message}(P,Q,\tau,\mu) \]
\[ \rightarrow \]
\[ (\exists m : \langle \text{set } Q',\mu' : \text{message}(Q',P,\tau,\mu') :: \mu' \rangle = M \setminus \{m\} :: \text{state}(P,\mu(P,\sigma,m)) \land \text{message}(P,Q,\tau,\mu) \land \text{gclock}(\tau) \lor \]
\[ \neg \text{message}(P,Q,\tau,\mu) \land \text{gclock}(\tau) \]

Reformulating the property to emphasize the form of the sets over their content, plus some math to remove redundant terms, we get the following:

\[ \text{gclock}(\tau) \land \text{message}(P,Q,\tau,\mu) \land \langle \text{set } Q',\mu' : \text{message}(Q',P,\tau,\mu') :: \mu' \rangle = M \land M \neq \{ \} \land \text{state}(P,\sigma) \]
\[ \rightarrow \]
\[ (\text{gclock}(\tau) \land \text{message}(P,Q,\tau,\mu) \land \langle \text{set } Q,\mu' : \text{message}(Q,P,\tau,\mu') :: \mu' \rangle \subset M) \lor \]
\[ (\neg \text{message}(P,Q,\tau,\mu) \land \text{gclock}(\tau)) \]

Since F2 requires that there is always exactly one \( \sigma \) for which \text{state}(P,\sigma) \) is true, and since \( \sigma \) does not appear anywhere else in the property, we can apply the general disjunction rule for leads-to to remove \text{state}(P,\sigma) \) from the left-hand-side, giving:

\[ \text{gclock}(\tau) \land \text{message}(P,Q,\tau,\mu) \land \langle \text{set } Q',\mu' : \text{message}(Q',P,\tau,\mu') :: \mu' \rangle = M \land M \neq \{ \} \]
\[ \rightarrow \]
\[ (\text{gclock}(\tau) \land \text{message}(P,Q,\tau,\mu) \land \langle \text{set } Q,\mu' : \text{message}(Q,P,\tau,\mu') :: \mu' \rangle \subset M) \lor \]
\[ (\neg \text{message}(P,Q,\tau,\mu) \land \text{gclock}(\tau)) \]

Finally, observe that

\[ \text{message}(P,Q,\tau,\mu) \Rightarrow \langle \text{set } Q',\mu' : \text{message}(Q',P,\tau,\mu') :: \mu' \rangle \neq \{ \} \]
which allows us to remove the \( M \neq \emptyset \) disjunct from the lhs, giving

\[
\begin{align*}
gcloc(\tau) \land \text{message}(P,Q,\tau,\mu) \land (\text{set } Q',\mu' : \text{message}(Q',P,\tau,\mu') :: \mu') = M \\
\implies (gcloc(\tau) \land \text{message}(P,Q,\tau,\mu) \land (\text{set } P',Q',\mu' : \text{message}(P',Q',\tau,\mu') :: \mu') \subset M) \lor \\
(gcloc(\tau) \land \neg \text{message}(P,Q,\tau,\mu))
\end{align*}
\]

Which implies \( L4 \). \( \square \)

**Note.** We can also use \( L4 \) to prove another result:

\[
L5: \quad gcloc(\tau) \iff (\forall P,Q,\mu :: \neg \text{message}(P,Q,\tau,\mu))
\]

\( L5 \) implies \( F19.2 \) (by application PSP over \( L5 \) and \( F14 \)), which can therefore be dropped from the specification.

The proof of \( L5 \) also uses the induction principle for leads-to, this time over the size of the set of messages, as opposed to its content. From \( L4 \), we can conclude the following:

\[
L6: \quad gcloc(\tau) \land (\Sigma P,Q,\mu : \text{message}(P,Q,\tau,\mu) :: 1) = M \land M > 0 \\
\implies gcloc(\tau) \land (\Sigma P,Q,\mu : \text{message}(P,Q,\tau,\mu) :: 1) < M
\]

using the general disjunction theorem for leads-to. To complete the proof that \( L4 \) implies \( L5 \), we must be able to show the following:

\[
L7: \quad gcloc(\tau) \land (\Sigma P,Q,\mu : \text{message}(P,Q,\tau,\mu) :: 1) = M \\
\implies (gcloc(\tau) \land (\Sigma P,Q,\mu : \text{message}(P,Q,\tau,\mu) :: 1) < M) \lor \\
(\forall P,Q,\mu :: \neg \text{message}(P,Q,\tau,\mu))
\]

The result follows immediately from \( L6 \) and

\[
gcloc(\tau) \land (\Sigma P,Q,\mu : \text{message}(P,Q,\tau,\mu) :: 1) = 0 \implies gcloc(\tau) \land (\forall P,Q,\mu :: \neg \text{message}(P,Q,\tau,\mu)) \quad \square
\]

**Refined Specification.**

**Safety:** \( F1-F3, F5, F7-F17, F21 \)

**Progress:** \( F18.2.1, F19.3, F20 \)
5.6. Add upper bound on time increment

We are now in a position to perform the second refinement suggested in section 4.3. While F20 provides a lower bound on the time increment (namely 1), it does not include the upper bound. Such an upper bound is provided by F21, since the time may never exceed the earliest message or action time. By folding F21 into F20, the progress property expresses all the constraints on the clock movement, which is convenient when the time comes to write the abstract program.

**Refinement.** F20 is replaced by F20.1, which requires that the clock move forward whenever the earliest work to be done is in the future.

F20.1 \[
\text{gclock}(T) \land \\
(\text{message}(P,Q,T',\mu) \lor (\text{action}(P,T',\alpha) \land \alpha \neq \bot)) \land \\
\tau = \langle \min P,Q,T',\mu,\alpha : \text{message}(P,Q,T',\mu) \lor (\text{action}(P,T',\alpha) \land \alpha \neq \bot) \rangle :: T' \rangle \land \\
\tau > T \\
\Rightarrow \\
\langle \exists T' : \tau \geq T' \geq T+1 :: \text{gclock}(T') \rangle
\]

**Proof Obligations.**

F20.1 \[\vdash\] F20.

**Proof Outline.** From L5 (and an analogous property over actions, the proof of which is omitted), we can see eventually the system reaches a state in which there are no messages or actions for the current time. That is, we have the property:

L8: \[\text{gclock}(T) \iff \langle \forall P,Q,\mu :: \neg \text{message}(P,Q,T,\mu) \rangle \land \langle \forall P,\alpha :: \neg \text{action}(P,T,\alpha) \lor \alpha = \bot \rangle\]

F21 thus implies that the minimum activity is in the future, and from F20.1, the clock must move forward. Since The left-hand side of F20 implies the left-hand side of L8, we have F20.

**Refined Specification.**

Safety: \[F1-F3, F5, F7-F17, F21\]

Progress: \[F18.2.1, F19.3, F20.1\]
5.7. The Final Specification.

At this point, the three progress properties describe transformations which can easily be considered atomic, which was our goal in the refinement process. The complete specification is reproduced below.

F1: \[ \text{inv. } \langle \sum T : \text{gclock}(T) :: 1 \rangle = 1 \]

F2: \[ \text{inv. } \langle \sum \sigma : \text{state}(P, \sigma) :: 1 \rangle = 1 \]

F3: \[ \text{inv. } \langle \sum \tau, \alpha : \text{action}(P, \tau, \alpha) :: 1 \rangle = 1 \]

F5: \[ \text{inv. } \text{message}(P, Q, \tau, \mu) \land \text{message}(P', Q', \tau', \mu') \land (P, Q, \tau) \neq (P', Q', \tau') \Rightarrow \mu \neq \mu' \]

F7: \[ \text{gclock}(\tau) \land (\exists \tau' : \tau' \geq \tau + 1 : \text{gclock}(\tau')) \]

F8: \[ \text{action}(P, \tau, \alpha) \land (\langle \text{set } Q, \tau, \mu : \text{message}(P, Q, \tau, \mu) :: \mu \rangle = M) \]
\[ \text{unless } \neg \text{action}(P, \tau, \alpha) \lor (\langle \text{set } Q, \tau, \mu : \text{message}(P, Q, \tau, \mu) :: \mu \rangle \subset M) \]

F9: \[ \text{message}(P, Q, \tau, \mu) \land \text{gclock}(\tau) \]

F10: \[ \text{message}(P, Q, \tau, \mu) \land \text{gclock}(\tau) \land \neg \text{message}(P, Q, \tau, \mu) \land \text{gclock}(\tau) \]

F11: \[ \neg \text{message}(P, Q, \tau, \mu) \land \text{gclock}(\tau) \land \neg \text{gclock}(\tau) \]

F12: \[ \text{state}(P, \sigma) \land \text{gclock}(\tau) \land (\langle \text{set } Q, \mu : \text{message}(Q, P, \tau, \mu) :: \mu \rangle = M \land M \neq \{ \}) \]
\[ \text{unless } (\exists \mu : (\langle \text{set } Q, \mu' : \text{message}(Q, P, \tau, \mu') :: \mu' \rangle = M \setminus \{\mu\} : \text{state}(P, u(P, \sigma, \mu))) \]

F13: \[ \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \land \neg \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \land \text{gclock}(\tau) \]

F14: \[ \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \land \text{gclock}(\tau) \]
\[ \text{unless } \neg \text{action}(P, \tau, \alpha) \land \alpha \neq \bot \land \text{gclock}(\tau) \land (\forall Q, \mu : \neg \text{message}(Q, P, \tau, \mu)) \]

F15: \[ \text{state}(P, \sigma) \land \text{enabled}(P, \tau, \alpha) \]
\[ \text{unless } \text{action}(P, \sigma, \alpha, a(P, \sigma, \alpha)) \land \text{state}(P, s(P, \sigma, \alpha)) \land (\forall Q : c(P, Q, \sigma, \alpha) :: \text{message}(P, Q, l(P, Q, \tau, \sigma, \alpha) \lor (P, Q, \sigma, \alpha))) \]

F16: \[ \text{stable } \text{halt}(P) \]

F17: \[ \text{state}(P, \sigma) \land \text{gclock}(\tau) \land \sigma \land (\exists Q, \tau', \alpha, \mu : \text{gclock}(\tau') :: \text{enabled}(P, \tau', \alpha) \lor \text{message}(Q, P, \tau', \mu)) \]

F18.2.1: \[ \text{gclock}(\tau) \land (\langle \text{set } Q, \mu : \text{message}(Q, P, \tau, \mu) :: \mu \rangle = M \land M \neq \{ \}) \land \text{state}(P, \sigma) \]
\[ \Rightarrow (\exists \mu : (\langle \text{set } Q, \mu' : \text{message}(Q, P, \tau, \mu') :: \mu' \rangle = M \setminus \{\mu\} : \text{state}(P, u(P, \sigma, \mu))) \]

25
5.8. Abstract Program

The three progress properties (F18.2.1, F19.3, and F20.1) suggest an abstract program having three transaction types: one to execute actions, one to move the clock, and one to deliver messages. The remainder of this section gives these transactions. Informally, each transaction consists of two subtransactions, one to establish the required progress condition, and one to continue the computation. The transactions are given without proof.

Accompanying each transaction is a list of the properties which directly constrain its form, and which are therefore likely to be affected directly by any refinement of the transaction. We make use of these characteristic properties in the program refinement stages as a heuristic for reducing the amount of formal proof required. Note that this does not imply that it is not necessary to prove that the transaction satisfies the property, but rather that the proof can be deferred until the program refinement process is completed, since the refinements are unlikely to invalidate the property.

Property F18.2.1 requires that a message with a delivery time equal to the current simulation time be incorporated into the state of the destination processor. We implement this as a global transaction of type gdeliver, which does this work for all messages at all processors.

\[
gdeliver = \begin{array}{l}
\forall P,Q,\tau,\mu,\sigma : \\
gclock(\tau), \text{state}(P,\sigma), \text{message}(Q,P,\tau,\mu) \rightarrow \text{message}(Q,P,\tau,\mu), \text{state}(P,\sigma), \text{state}(P,u(P,\sigma,\mu)) \\
\text{TRUE} \rightarrow \text{gdeliver}
\end{array}
\]

The characteristic properties for this transaction are F2, F9, F10, and F12.
Property F19.3 requires that the execution of an enabled action result in the atomic update of the processor's local state, the creation of the next action, and (possibly) the sending of messages to other processors. To accomplish this, we introduce a transaction type `task`, having the same form as `action`, with the obvious relationship:

\[ \text{inv. } \exists \text{ task}(P, \tau, \alpha) \Rightarrow \text{action}(P, \tau, \alpha) \]

Because this transaction affects the entire abstract state, it is necessary to verify that it violates none of the safety properties of the specification. A brief perusal of the specification identifies F2, F3, F13, F14, F15, and F16 as the characteristic set for this transaction.

\[
\text{task}(P, \tau, \alpha) = \\
\sigma : \\
\alpha \neq \bot, \ \text{gclock}(\tau), \ \{ \forall \ Q, \mu :: -\text{message}(Q, P, \tau, \mu) \}, \ \text{state}(P, \sigma) \\
\rightarrow \\
\text{task}(P, e(P, \sigma, \alpha), a(P, \sigma, \alpha)), \\
\text{state}(P, \sigma \uparrow), \ \text{state}(P, s(P, \sigma, \alpha)), \\
\{ \ Q : c(P, Q, \sigma, \alpha) :: \text{message}(P, Q, \tau, \sigma, \alpha), v(P, Q, \sigma, \alpha) \} \}
\]

Finally, property F20.1 requires that time move forward when all the work to do is in the future. This is accomplished by introducing a transaction of type `gtick` which examines the global system state and moves the clock forward when there are no `tasks` or `messages` to be processed at the current time.

\[
\text{gtick} = \\
T, T' : \\
\text{gclock}(T), \\
\{ \forall P, Q, \tau, \alpha, \mu : \text{message}(Q, P, \tau, \mu) \vee (\text{task}(P, \tau, \alpha) \wedge \alpha \neq \bot) :: \tau > T \}, \\
T + 1 \leq T' \leq (\min P, Q, \tau, \alpha, \mu : \text{message}(Q, P, \tau, \mu) \vee (\text{task}(P, \tau, \alpha) \wedge \alpha \neq \bot) :: \tau) \\
\rightarrow \\
\text{gclock}(T) \uparrow, \ \text{gclock}(T) \\
\] TRUE \rightarrow \text{gtick}

As with the other transactions, there is no need to prove that this transaction satisfies most of the safety properties.

In particular, we need only provide proofs for properties F1, F7, F10, F11, F14, F15, and F21.

6. Architecture-Directed Program Refinement

The abstract program definitely solves the simulation problem. However, the widespread use of global data items makes it unsuitable for implementation on any truly distributed architecture. Our refinement methodology now turns to address this problem. In the next two sections, we give formal descriptions of the constraints imposed
on programs by two example architectures, and then use these constraints to derive a solution specifically tailored to each architecture.

6.1. Consensus Bus

Our first example architecture is a classic distributed memory message-passing architecture with a special-purpose device which makes it particularly well-suited for use in distributed simulation. The target machine consists of a number of identical components called sites. Each site $h$ contains a controller $k$, a processor $e$, and two memory units, $r$ (registers) and $m$ (main memory). We assume that all entities in the physical system are assigned distinguishable identifiers which we can use to refer to each component, and let $H$ denote the set of site identifiers. The sites are connected by a data bus $D$ and a consensus bus $C$. Both the controller and the processor are sequential machines capable of executing one operation at a time. At each site, the two processing units run asynchronously; that is, they do not share a common clock. The registers are used to allow the controller and processor at a single site to share a limited amount of data. Reads and writes to the registers are atomic. Figure 2 shows a stylized representation of a single site.

The data bus is the primary mechanism for sharing information between sites. This bus connects the processors and main memories from all sites, creating a limited form of distributed shared memory. Using the data bus, processors can write to, but not read from, memories at other sites. In a sense, the data bus implements a message-passing channel which allows processors to pass data amongst themselves by leaving messages in mailboxes. The local memories can be read only by the processor at the same site, no other memory accesses are permitted. In particular, the controllers cannot access the main memories, and the register units can only be accessed by the processing units at the same site. All accesses to the main memories (both reads and writes) are atomic.

The consensus bus is a specialized hardware device which allows all the controller elements to share a limited amount of information in the following manner. Logically, during the execution of a single step, the controllers at each site provide to the bus a data value. The consensus bus computes a hardwired function of these values, the result of which is then provided to the controllers and can be used in the remainder of the step. The result is a synchronous, lock-step execution by the controllers.
Allocation Constraints. Now, we can define formally a site $h$ as a five-tuple consisting of the site identifier $h.id$ and the identifiers for the individual hardware components associated with the site: a controller $h.k$, a bank of registers $h.r$, a processor $h.e$, and a memory $h.m$. $H$ is the set consisting of all site identifiers. We also introduce the set $K$ of controller identifiers, the set $R$ of register identifiers, the set $E$ of processor identifiers, and the set $M$ of memory identifiers. Since each hardware component belongs to one and only one site we find it convenient to introduce a function:

$$site : K \cup R \cup E \cup M \rightarrow H$$

which given the identifier of a component, returns the identifier for the site to which the component belongs.

We now consider formally the constraints on allocation of programs and data to sites. For this purpose, we can introduce a predicate $locus(i,j)$, where $i$ is from the set of transactions or the set of tuples, and $j$ is from the set of identifiers for hardware components. The predicate $locus(i,j)$ is true if the dataspaces element $i$ is allocated to component $j$, and is initially restricted so that transactions are mapped to processing units, and data tuples to memory units:

$$t \in TRS \land locus(t.i) \Rightarrow i \in E \cup K$$
\( \forall v \in \text{TPS} \land \text{locus}(v, i) \Rightarrow i \in R \cup M \)

This very general definition of the locus predicate can be further constrained to reflect the specifics of the simulator hardware. In the present case, the sequential nature of the processing units constrains the allocation of software to hardware, allowing at most one transaction to be present at any time on each of the processing units:

\[ \text{C}_C1: \quad \text{inv.} \ [t_1] \land [t_2] \land t_1 \neq t_2 \land \text{locus}(t_1, i_1) \land \text{locus}(t_2, i_2) \Rightarrow i_1 \neq i_2 \]

(The notation \([t]\) means "the transaction instance \(t\) exists in the dataspace.") Also, because the processors run asynchronously with respect to the rest of the system, transactions which are placed on them cannot be part of any synchronous group containing transactions allocated to other processing units:

\[ \text{C}_C2: \quad \text{inv.} \ [t_1] \land [t_2] \land t_1 \neq t_2 \land \text{locus}(t_1, i) \land i \in E \Rightarrow \neg (t_1 = t_2) \]

Finally, since controllers execute synchronously, transactions which are placed on them must all be part of the same synchronous group.

\[ \text{C}_C3: \quad \text{inv.} \ [t_1] \land [t_2] \land \text{locus}(t_1, i_1) \land i_1 \in K \land \text{locus}(t_2, i_2) \land i_2 \in K \Rightarrow t_1 = t_2 \]

Access Restrictions. We turn now to the constraints on memory access by transactions. In particular, we wish to constrain the locations of reads and writes made by transactions. Our approach is to introduce auxiliary tuples which record the reading or writing of a variable by a program running on one of the processing units. We can describe the read/write constraints in terms of invariants over two auxiliary tuples, \(raccess(i, j)\) and \(waccess(i, j)\), where \(i \in K \cup E\), and \(j \in K \cup R \cup E \cup M\) (we include the possibility that a transaction is read or written).

The presence of one of these tuples in the dataspace indicates that a transaction with locus \(i\) has read (or written) an entity with locus \(j\). To prove that a transaction satisfies one of these constraints, all subtransactions are augmented to insert an \(raccess\) tuple whenever a tuple appears in the query of the subtransaction, and to insert a \(waccess\) tuple whenever a tuple appears in the action. The access constraints can now be expressed in terms of three invariants.

\[ \text{C}_C4: \quad \text{Transactions on the processors can only read from memories located at the same site} \]
\[ \text{inv.} \ i \in E \land raccess(i, j) \Rightarrow j \in R \cup M \land \text{site}(i) = \text{site}(j) \]

\[ \text{C}_C5: \quad \text{Transactions on processors can write to any memory, to registers at the same site, and to the processor itself (to change execution state)} \]
\[ \text{inv.} \ i \in E \land waccess(i, j) \Rightarrow j \in M \lor (j \in R \cup E \land \text{site}(i) = \text{site}(j)) \]
CC6: *Transactions on controllers can read and write only registers at the same site or the controller itself*

\[\text{inv. } i \in K \land (\text{raccess}(i,j) \lor \text{waccess}(i,j)) \Rightarrow j \in K \cup R \land \text{site}(i) = \text{site}(j)\]

**Consensus bus.** The consensus bus is actually a specialized device which synchronizes the execution of all controllers. Additionally, at each step the bus accepts a boolean value from each controller and returns to all controllers the result of applying the logical *and* across all the boolean values received. There are two reasons for introducing the consensus bus in our illustration. First, from a practical viewpoint, such a bus is easy to construct and matches the needs of the simulator. Second, from a pedagogical perspective, the bus allows us to illustrate the formalization of a highly specialized device and the expressive power of the synchronic group construct in Swarm.

Returning to the formalization of the constraints imposed by the consensus bus, we take advantage of the built-in consensus feature associated with synchronic groups. It allows us to reduce the effect of the consensus bus to restricting transactions allocated to the controllers from using any of the special queries except for **AND** and **NAND** (and of course, **TRUE**). To accomplish this, we augment each subtransaction on controller \( k \) that uses **OR** (or **NOR**), e.g.,

\[
\| : \text{OR}, \text{query} \rightarrow \text{action}
\]

so as to insert an auxiliary tuple that records the improper use of the consensus bus, i.e.,

\[
\| : \text{OR}, \text{query} \rightarrow \text{action}, \text{invalid\_consensus()}
\]

and add the following proof obligation:

CC7: *inv. \rightarrow \text{invalid\_consensus()}*

We rely here upon CC3, which requires all transactions allocated to controllers to execute synchronously, allowing us to make use of the Swarm consensus mechanism. This formulation is in fact stronger than a syntactic restriction; for example, if the query associated with the **OR** always evaluates to **false**, we can prove that no improper use of the consensus bus takes place in spite of the fact that the syntax alone suggests otherwise.

The formalization of architectural constraints clearly depends upon the computational model being used. The fact that Swarm already provides a form of built-in consensus makes the formalization trivial. This does not mean that our assertional method would be inappropriate otherwise, but it does mean that the formalization, by ne-
cessity, would be more complex. In the absence of the built-in consensus, auxiliary tuples would be needed to keep track of the booleans supplied by each controller and the values returned by the bus. The relation between these values would, of course, be constrained by an appropriate invariant.

Initial Mapping. We now return to the task of mapping the abstract program onto this architecture. Obviously, not all of the abstract program's state can be allocated without violating the architectural constraints. This fact will drive the derivation process as we move to resolve those allocation decisions which cannot be made at this point. Nevertheless, certain allocations are required by the nature of the architecture; for example, the task transaction must be allocated to the processors as the tasks at each processor must run asynchronously, the processor's state should be placed into the processor's local memory, and messages which must be sent from one processor to another, should also be allocated to memories.

\[
\begin{align*}
\text{A}_{C1} & : \text{inv. locus(task}(P,\tau,\alpha),i) \Rightarrow i \in E \\
\text{A}_{C2} & : \text{inv. locus(state}(P,\sigma),i) \Rightarrow i \in M \\
\text{A}_{C3} & : \text{inv. locus(message}(P,Q,\tau,\mu),i) \Rightarrow i \in M \\
\text{A}_{C4} & : \text{inv. locus(state}(P,\sigma),i_1) \land \text{locus(task}(P,\tau,\alpha),i_2) \Rightarrow \text{site}(i_1) = \text{site}(i_2) \\
\text{A}_{C5} & : \text{inv. locus(message}(Q,P,\tau,\mu),i_1) \land \text{locus(task}(P,\tau,\alpha),i_2) \Rightarrow \text{site}(i_1) = \text{site}(i_2)
\end{align*}
\]

Note that this allocation satisfies \(C_{C1}-C_{C2}\) for the entire abstract state, and that \(C_{C3}, C_{C6}\) and \(C_{C7}\) are satisfied vacuously (since no transaction is allocated to the controllers). However, there is no allocation of either \textit{gdeliver} or \textit{gllock} that satisfies the requirements. Obviously, neither can be placed on the processors without violating \(C_{C1}\), and allocating them to the controllers would violate \(C_{C6}\), since each needs access to information located in the site's main memory. Further, the \textit{gllock} tuple cannot be allocated to any memory or register bank, as it must be read by every processor, violating either \(C_{C4}\) or \(C_{C6}\). The remainder of the refinement process is guided by these failures.

Allocate \textit{gdeliver}. The problem of resolving the allocation of \textit{gdeliver} is most easily solved, so we address it first. Since the work done by the transaction is clearly local, the obvious solution is to distribute the processing to each site, introducing a local \textit{deliver} transaction, with one copy of the transaction for each node.

Refinement. The \textit{gdeliver} transaction type is replaced by a \textit{deliver} transaction type, formed by parameterizing \textit{gdeliver} with the node id, as follows:
\begin{align*}
\text{deliver}(P) = \\
Q, \tau, \mu, \sigma : \\
gclock(\tau), \text{state}(P, \sigma), \text{message}(Q, P, \tau, \mu) \rightarrow \text{message}(Q, P, \tau, \mu)^\dagger, \text{state}(P, \sigma)^\dagger, \text{state}(P, u(P, \tau, \mu)) \\
\text{true} \rightarrow \text{deliver}(P)
\end{align*}

We require that there be a \textit{deliver} transaction for each processor, i.e.,

\[ C_5: \quad \text{inv. } (\exists Q, \tau, \alpha, \mu : (\text{action}(Q, \tau, \alpha) \land \alpha \neq \bot) \lor \text{message}(Q, P, \tau, \mu) ) \Rightarrow \text{deliver}(P) \]

This formulation requires that a \textit{deliver} transaction exists if there is the possibility of any new messages appearing in the future, while allowing the program to terminate when the simulation is complete. Additionally, we introduce further refinements on \textit{locus} to guarantee that the transaction satisfies the access restrictions (\(C_4\) and \(C_5\)):

\[ A_6: \quad \text{inv. } \text{locus} (\text{deliver}(P), i) \Rightarrow i \in E \\
A_7: \quad \text{inv. } \text{locus} (\text{deliver}(P), i_1) \land \text{locus} (\text{task}(P, \tau, \alpha), i_2) \Rightarrow \text{site}(i_1) = \text{site}(i_2) \\
\]

\textbf{Re-Verification Obligations.} All the safety properties associated with \textit{gdeliver} (\(F_2, F_9, F_{10}, F_{12}\), or \(F_{18.2.1}\)) continue to be satisfied, since no new state transitions are introduced by this refinement. The progress property (\(F_{18.2.1}\)) also continues to hold. This can be seen by observing that any transition which would have been made by the global transaction will be performed by one of the local transactions (specifically the one assigned to the destination site); and since the local transactions are always re-created, the statement needed to make the transition always exists.

\textbf{Outstanding Violations.} This allocation violates \(C_1\) as \textit{task} is already allocated to the processor, but we will deal with this violation later. Additionally, neither \textit{gclock} nor \textit{gluck} can yet be allocated.

\textbf{Allocate gclock(T).} We now turn to the problem of allocating the clock. Since \textit{task} reads \textit{gclock}, \(C_4\) applies, requiring that \textit{gclock} be located on every processor which has a \textit{task}, and that the tuple be located in either registers or memory. This suggests that a local clock should be maintained at each site, with all clocks running in lockstep.

\textbf{Refinement.} We introduce a tuple of type \textit{clock}, with one tuple for each node. The tuple replaces the \textit{gclock} tuple type. All three transactions must be re-written to make use of the new data representation.
task(P, \tau, \alpha) =
\sigma:
\alpha \neq 1, \text{clock}(P, \tau), (\forall Q, \mu : \neg \text{message}(Q, P, \tau, \mu)), \text{state}(P, \sigma)
\rightarrow
\text{task}(P, \text{c}(P, \sigma, \alpha), \text{a}(P, \sigma, \alpha)),
\text{state}(P, \sigma) \uparrow, \text{state}(P, \text{d}(P, \sigma, \alpha)),
\langle Q : \text{c}(P, Q, \sigma, \alpha) \rightarrow \text{message}(Q, P, \tau, \sigma, \alpha), \nu(P, Q, \sigma, \alpha) \rangle
\parallel \text{NOR} \rightarrow \text{task}(P, \tau, \alpha)

gtick =
P, T, T' :
clock(P, T),
(\forall P, Q, \tau, \alpha, \mu : \text{message}(Q, P, \tau, \mu) \lor (\text{task}(P, \tau, \alpha) \land \alpha \neq 1) : \tau > T),
T + 1 \leq T' \leq (\min P, Q, \tau, \alpha, \mu : \text{message}(Q, P, \tau, \mu) \lor (\text{task}(P, \tau, \alpha) \land \alpha \neq 1) : \tau)
\rightarrow
\langle P : P \in \text{nodes} \rightarrow \text{clock}(P, T) \uparrow, \text{clock}(P, T') \rangle
\parallel \text{TRUE} \rightarrow \text{gtick}

deliver(P) =
Q, \tau, \mu, \sigma :
clock(P, \tau), \text{state}(P, \sigma), \text{message}(Q, P, \tau, \mu) \rightarrow \text{message}(Q, P, \tau, \mu) \uparrow, \text{state}(P, \sigma) \uparrow, \text{state}(P, \text{c}(P, \sigma, \mu))
\parallel \text{TRUE} \rightarrow \text{deliver}(P)

We introduce the obvious requirements that there be exactly one clock for each processor, and that all clocks carry the same time.

CC9: \text{inv.} (\Sigma T : \text{clock}(P, T) : 1) = 1
CC10: \text{inv.} gclock(T) = (\forall P : \text{clock}(P, T))

Additionally, to resolve the violation of CC4 by task and deliver, we must allocate the clock tuple to either registers or memories at the same site as the two transactions. That is, we introduce the following additional restrictions on locus:

AC8: \text{inv.} \text{locus}((\text{clock}(P, T), i) \Rightarrow i \in M \cup R
AC9: \text{inv.} \text{locus}((\text{clock}(P, T), i_1) \land \text{locus}((\text{task}(P, \tau, \alpha), i_2) \Rightarrow \text{site}(i_1) = \text{site}(i_2)

Re-Verification Obligations. The transactions task and deliver are actually unaffected by this transformation since gclock(T) = clock(P, T). As far as gtick is concerned, it clearly satisfies CC10, and consequently, all other behavioral obligations. Additionally, we can now show that the task and deliver transactions access only local data.

Outstanding Violations. Both task and deliver now satisfy all architectural constraints except for CC1. gtick cannot be allocated without violating one of CC4-CC6.
Allocate gtick. As we turn to address the problem of allocating gtick, note that the "∀" in the query suggests using the consensus bus, so an allocation to the controllers is proposed. With this in mind, we introduce a distributed processing scheme, with one tick transaction at each site.

Refinement. The refinement is presented in two steps. First, we present a formulation with a separate subtransaction for each processor P. Each subtransaction checks locally for work to be performed. The global check is then done using an AND special predicate query, which succeeds only when each of the local subtransactions succeed (that is, all the work at each site is in the future). Additionally, we must select a time increment that does not violate F21. Since we cannot pass any additional information between controllers, and since all sites must have the same time value, 1 is the only reasonable choice.

\[
\begin{align*}
\text{gtick} & = \\
& \begin{cases} \\
T : \text{clock}(P,T), (\forall Q,\tau,\alpha,\mu : \text{message}(Q,P,\tau,\mu) \lor (\text{task}(P,\tau,\alpha) \land \alpha \neq \bot) :: \tau > T ) \rightarrow \text{skip} \\
T : \text{AND}, \text{clock}(P,T) \rightarrow \text{clock}(P,T)^\uparrow, \text{clock}(P,T+1) \\
\text{TRUE} \rightarrow \text{gtick} \\
\end{cases}
\end{align*}
\]

Now we can separate the gtick transaction into a collection of local, synchronous tick transactions, one for each processor, with the AND special query computed by the consensus bus. The parameter to the local transaction is the P from the subtransaction generator. This transformation retains the semantics of the previous transaction, since the individual tick transactions are in the same synchronous group. This gives us the following definition for the new transaction:

\[
\begin{align*}
tick(P) & = \\
& T : \text{clock}(P,T), (\forall Q,\tau,\alpha,\mu : \text{message}(Q,P,\tau,\mu) \lor (\text{task}(P,\tau,\alpha) \land \alpha \neq \bot) :: \tau > T ) \rightarrow \text{skip} \\
& T : \text{AND}, \text{clock}(P,T) \rightarrow \text{clock}(P,T)^\uparrow, \text{clock}(P,T+1) \\
& \text{TRUE} \rightarrow \text{tick}(P)
\end{align*}
\]

To maintain the invariant that all clocks have the same value, we must require that, if any tick transaction remains in the dataspace, then all transactions remain, i.e.,

\[
\begin{align*}
\text{C}_{11} : \text{inv.} \text{tick}(P) \Leftrightarrow \text{tick}(Q)
\end{align*}
\]

Since we make use of the consensus bus, we will need to allocate the transactions to the controllers:

\[
\begin{align*}
\text{A}_{10} : \text{inv.} \text{locus}(\text{tick}(P),i) \Rightarrow i \in K \\
\text{A}_{11} : \text{inv.} \text{locus}(\text{tick}(P),i_1) \land \text{locus}(\text{task}(P,\tau,\alpha),i_2) \Rightarrow \text{site}(i_1) = \text{site}(i_2) \\
\text{C}_{12} : \text{inv.} \text{tick}(P) \Rightarrow \text{tick}(Q)
\end{align*}
\]

35
Finally, since the \texttt{tick} transaction reads and writes the \texttt{clock} tuple, that tuple must be allocated to the registers, allowing us to refine the \textit{locus} of \texttt{clock} further:

\begin{align*}
A_{C12}: \quad \text{inv. locus(clock}(P,T),i) \Rightarrow i \in R
\end{align*}

\textbf{Re-Verification Obligations.} Because the combination of \texttt{tick} transactions is functionally equivalent to the original \texttt{gtick}, no re-verification is required.

\textbf{Outstanding Violations.} We are left with a violation of \texttt{C6} by \texttt{tick}, since it must determine the existence or non-existence of messages and tasks, neither of which are allocated to the registers. The violations of \texttt{C1} by \texttt{task} and \texttt{deliver} are also outstanding.

\textbf{Detecting absence of work.} To remove the violation of \texttt{C6} by \texttt{tick}, we introduce two new tuple types, allocated to the registers, which contain sufficient information to allow \texttt{tick} to detect that there is no remaining work to be done at a site at the current time.

\textbf{Refinement.} We introduce the tuple types \texttt{event}(P,\tau) and \texttt{no_msg}(P), having the meanings "the next \texttt{task} to execute for node \textit{P} is at time \textit{\tau}," and "there are no remaining unabsorbed messages for node \textit{P} at the present time," respectively.

\begin{align*}
A_{C13}: \quad & \text{inv. locus(event}(P,\tau),i) \Rightarrow i \in R \\
A_{C14}: \quad & \text{inv. locus(event}(P,\tau),i_1) \land \text{locus(task}(P,\tau,\sigma),i_2) \Rightarrow \text{site}(i_1) = \text{site}(i_2) \\
A_{C15}: \quad & \text{inv. locus(no_msg}(P),i) \Rightarrow i \in R \\
A_{C16}: \quad & \text{inv. locus(no_msg}(P),i_1) \land \text{locus(task}(P),i_2) \Rightarrow \text{site}(i_1) = \text{site}(i_2)
\end{align*}

The \texttt{task} transaction maintains the \texttt{event} tuple, while \texttt{deliver} and \texttt{tick} cooperate to update \texttt{no_msg}.

\begin{align*}
\text{task}(P,\tau,\alpha) = \\
\sigma: \\
\quad & \alpha \neq \bot, \text{clock}(P,\tau), \text{no_msg}(P), \text{state}(P,\sigma) \\
\quad \Rightarrow \\
\quad & \text{event}(P,\tau), \\
\quad & \text{task}(P,e(P,\tau,\alpha),a(P,\sigma,\alpha)), \{ : a(P,\sigma,\alpha) \neq \bot :: \text{event}(P,e(P,\sigma,\alpha)) \}, \\
\quad & \text{state}(P,\sigma), \text{state}(P,s(P,\sigma,\alpha)), \\
\quad & \{ Q : c(P,Q,\sigma,\alpha) :: \text{message}(P,Q,\text{l}(P,Q,\tau,\sigma,\alpha),v(P,Q,\sigma,\alpha)) \} \\
\quad \ll \text{NOR} \rightarrow \text{task}(P,\tau,\alpha)
\end{align*}

\begin{align*}
\text{tick}(P) = \\
\quad & T : \text{clock}(P,T), \text{no_msg}(P), \neg\text{event}(P,T) \rightarrow \text{skip} \\
\quad \ll \quad T : \text{AND}, \text{clock}(P,T) \rightarrow \text{clock}(P,T), \text{clock}(P,T+1), \text{no_msg}(P) \\
\quad \ll \quad \text{TRUE} \rightarrow \text{tick}(P)
\end{align*}
\[
\text{deliver}(P) \equiv \\
Q, \tau, \mu, \sigma : \\
\text{clock}(P, \tau), \text{state}(P, \sigma), \text{message}(Q, P, \tau, \mu) \rightarrow \text{message}(Q, P, \tau, \mu) \uparrow, \text{state}(P, \sigma) \uparrow, \text{state}(P, \mu(P, \sigma, \mu)) \\
\| \tau : \text{clock}(P, \tau), (\forall Q, \mu :: \neg \text{message}(Q, P, \tau, \mu)) \rightarrow \text{no\_msg}(P) \\
\| \text{TRUE} \rightarrow \text{deliver}(P)
\]

**Re-Verification Obligations.** We must show that the new forms of each transaction satisfy the corresponding characteristic properties. We can also show that there are now no access violations by \textit{tick}.

**Proof Outline.** To prove the functional correctness of this refinement, we must show that the new representation carries the same semantics as the original. In particular, we must show that \textit{tick} cannot incorrectly detect the absence of work. This can be accomplished if the refined transactions maintain the following invariants, which equate the two data representations:

\begin{align*}
\text{C}13: & \quad \text{inv. event}(P, \tau) \equiv (\exists \alpha :: \text{task}(P, \tau, \alpha) :: \alpha \neq \bot) \\
\text{C}14: & \quad \text{inv. no\_msg}(P) \wedge \text{clock}(P, \tau) \Rightarrow (\forall Q, \mu :: \neg \text{message}(Q, P, \tau, \mu))
\end{align*}

The truth of these invariants is clear from the text of the transactions. Since the \textit{no\_msg} tuple is removed by \textit{tick} when the clock is advanced (to maintain \text{C}14), we require a progress property that guarantees that the removal of the last message for a node results in the re-insertion of \textit{no\_msg}. That is, we require:

\begin{align*}
\text{C}15: & \quad \text{no\_msg}(P) \text{ detects } \text{clock}(P, \tau) \wedge (\forall Q, \mu :: \neg \text{message}(Q, P, \tau, \mu))
\end{align*}

This property follows immediately from the invariance of \text{C}14 and the text of \text{deliver}.

**Outstanding Violations.** Only the violation of \text{C}1 by \textit{task} and \text{deliver} is outstanding.

**Satisfy uniprogramming requirements for processors.** Since the \text{deliver} and \textit{task} transactions require access to both the registers and memory at a site, both must be allocated to the processors. However, at most one transaction can be present on any processor. To satisfy \text{C}1, we must combine these 2 transactions in some way. This can be done either by combining the two transactions into a single transaction that does both, or by alternating them. We opt for the latter, since this more closely reflects the approach that might be used in a traditional programming language.
Refinement. We modify task and deliver as follows:

\[
\text{task}(P, \tau, \alpha) \equiv
\begin{align*}
\sigma : & \\
\alpha \neq \bot, \text{clock}(P, \tau), \text{no\_msg}(P), \text{state}(P, \sigma) \\
\rightarrow & \text{event}(P, \tau) \uparrow, \text{deliver}(P, \alpha(P, \sigma, \alpha)), \langle : a(P, \sigma, \alpha) \neq \bot :: \text{event}(P, \sigma(P, \sigma, \alpha)) \rangle, \\
& \text{state}(P, \sigma) \uparrow, \text{state}(P, s(P, \sigma, \alpha)), \\
& \langle Q : c(P, Q, \sigma, \alpha) :: \text{message}(P, Q, l(P, Q, \tau, \sigma, \alpha), v(P, Q, \sigma, \alpha)) \rangle \\
\end{align*}
\text{NOR} \rightarrow \text{task}(P, \tau, \alpha)
\]

\[
\text{deliver}(P, \alpha) \equiv
\begin{align*}
Q, \tau, \mu, \sigma : & \\
\text{clock}(P, \tau), \text{state}(P, \sigma), \text{message}(Q, P, \tau, \mu) \\
\rightarrow & \text{message}(Q, P, \tau, \mu) \uparrow, \text{state}(P, \sigma) \uparrow, \text{state}(P, u(P, \sigma, \mu)), \text{deliver}(P, \alpha) \\
\text{NOR} \rightarrow \text{no\_msg}(P) \\
\tau : \text{NOR}, \text{clock}(P, \tau), \text{event}(P, \tau) \rightarrow \text{task}(P, \tau, \alpha) \\
T, \tau : \text{NOR}, \text{clock}(P, \tau), ((\text{event}(P, T) \wedge T \neq \tau) \vee \alpha = \bot) \rightarrow \text{deliver}(P, \alpha)
\end{align*}
\]

with the additional requirement that only one of the two transactions is present at any given time:

\text{C}\text{C}16: \text{inv. } \neg(\text{task}(P, \tau, \alpha) \wedge \text{deliver}(P, \alpha'))

We have added a parameter to deliver to keep track of the action which should be performed when all messages have been delivered. There is no need to include the time of the next action, since that is already available from the event tuple.

Proof Obligations. This transformation does not affect any of the safety properties to be satisfied by task and deliver. We must, however, show that the transactions satisfy F18.2.1, the progress property which was previously satisfied by deliver, and F19.3, the progress property for task. Finally, it is obvious that the new transactions satisfy C\text{C}1 and, therefore, all architectural constraints are met at this point.

Proof Outline. To prove that messages are eventually delivered (F18.2.1), we must show that if there is a message to be delivered at the current time, then there is a deliver transaction to process it, that is

\text{inv. } \text{clock}(P, \tau) \wedge \text{message}(Q, P, \tau, \mu) \Rightarrow (\exists \alpha :: \text{deliver}(P, \alpha))

This invariant can be verified by examining the program text: deliver does not move the time nor create messages; tick does not create messages, and only advances the time when there are no messages and no task; and once a task is
executed, it creates the next deliver transaction, with all messages being in the future. Since the new form of deliver clearly performs the necessary state transitions, then we have F18.2.1.

Additionally, we must show that if there is an event scheduled at any given time, there will eventually be a task transaction to perform it, e.g.,

\[ \text{event}(P;\tau) \text{ until } \text{task}(P;\tau,\alpha) \]

The unless is clearly maintained by the entire program, and the leads-to is established by the third subtransaction of deliver.

This completes the derivation of the simulation program for execution on the consensus bus architecture.

To simplify the presentation, we did not consider the problem of termination, although it should be clear that the issue could have been addressed at the expense of slightly more complex formulations of some properties. We now turn our consideration to solving the problem on a very different architecture, to show the flexibility of this approach.

6.2. Ring

Our second example architecture is a traditional ring. The nodes in the ring are multiprogrammable, general purpose processors containing a local memory. Each processor is assumed to have a unique identifier from the set \[ \{ 0..N-1 \} \], where \( N \) is the size of the ring. Identifiers are assigned sequentially around the ring. The nodes on the ring are connected by one-way, asynchronous communication channels (communication is clockwise around the ring). This is a true distributed memory architecture, so a process can only directly read from its local memory. We model message passing as writes by one processor to another processor's memory; because communication proceeds clockwise around the ring, a processor can write both to its local memory, and to the memory of the process to its immediate right in the ring. We assume that all memory accesses are atomic. To simplify discussions about the ring, we introduce two definitions.

\[ \text{right}(P) = (P+1) \mod N \]

\[ R \text{ twixt } (P,Q) = (P > Q \land (P < R < N \lor 0 \leq R \leq Q)) \lor (P \leq Q \land P \leq R \leq Q) \]
**Allocation Constraints.** As with the consensus bus, we will express the mapping of program elements to hardware using the predicate \( locus(i, j) \), having the meaning that the transaction or tuple \( i \) is to be allocated to the processor with identifier \( j \).

**Access Restrictions.** We introduce two auxiliary tuple types \( raccess(i, j) \) and \( waccess(i, j) \) to record reads and writes. Both \( i \) and \( j \) are integers in the range \( 0..N-1 \), and have the meaning that a transaction resident on node \( i \) of the ring has read (or written) a tuple or transaction on node \( j \). The informal access constraints can be formally expressed using two predicates over these auxiliary tuples.

\[
\begin{align*}
CR1: \quad \text{inv. } \ raccess(i, j) \Rightarrow i = j \\
CR2: \quad \text{inv. } \ waccess(i, j) \Rightarrow i = j \lor j = \text{right}(i)
\end{align*}
\]

**Additional Restrictions.** The only additional constraint introduced by the ring architecture is the requirement that the nodes execute asynchronously. This can be succinctly stated within the Swarm notation using the \( = \) notation, as:

\[
\begin{align*}
CR3: \quad \text{inv. } [t_1] \land [t_2] \land locus(t_1, i_1) \land locus(t_2, i_2) \land i_1 \neq i_2 \Rightarrow \neg(t_1 = t_2)
\end{align*}
\]

**Initial Mapping.** Very little of the abstract program's initial state can be allocated without further refinement. While we can be certain that we want each processor's local \( state \) to be allocated to the same location as the \( task \) transaction \( (AR1) \), no other decisions are possible.

\[
\begin{align*}
AR1: \quad \text{inv. } \ locus(\text{task}(P, \tau, \alpha), i_1) \land locus(\text{state}(P, \sigma), i_2) \Rightarrow i_1 = i_2
\end{align*}
\]

Several problems are immediately obvious. First, \( gclock \) cannot be allocated to any node in the ring, since its value is read by every \( task \). Further, neither of the global transactions (\( gdeliver \) and \( glock \)) can be placed until \( gclock \) is dealt with. Finally, the restriction on writes (\( CR2 \)) makes it impossible for a transaction to send a message to any transaction allocated to a processor other than the one to its right. Until the allocation decisions are made, it is not possible to prove that any of the transactions satisfy the access constraints (\( CR1 \) and \( CR2 \)), since each transaction reads or writes the clock.

**Allocate gdeliver.** The problem of allocating \( gdeliver \) is the most easily solved. By \( CR2 \), any transaction which updates a \( state \) tuple must be located on the same node as the \( state \) tuple. As in the consensus bus architecture, this implies a distributed version of \( gdeliver \), with one transaction for each \( P \). The refinement is identical to that in the previous example, so we give it without further comment.
deliver(\(P\)) =
\[ Q, \tau, \mu, \sigma : \]
\[ \text{gclock}(\tau), \text{state}(P, \sigma), \text{message}(Q, P, \tau, \mu) \rightarrow \text{message}(Q, P, \tau, \mu) \vdash, \text{state}(P, \sigma) \vdash, \text{state}(P, \mu(P, \sigma, \mu)) \]
\[ \text{TRUE} \rightarrow \text{deliver}(P) \]

with
\[ \text{Cr4:} \quad \text{inv. } (\exists Q, \tau, \alpha, \mu : (\text{action}(Q, \tau, \alpha) \land \alpha \neq \bot) \lor \text{message}(Q, P, \tau, \mu)) \Rightarrow \text{deliver}(P) \]
\[ \text{Ar2:} \quad \text{inv. locus(deliver}(P), i_1) \land \text{locus(task}(P, \tau, \alpha), i_2) \Rightarrow i_1 = i_2 \]

Outstanding Violations. We still cannot allocate the gclock tuple, nor can we allocate the gtick transaction. The violation of the write restrictions (Cr2) with regards to messages also remains, and will be considered next.

Add a “current location” field to messages. Obviously, messages must be sent around the ring, since we cannot send messages directly between non-adjacent processors. As a first step, we will modify the form of messages to accommodate the routing process.

Refinement. We begin with a simple data refinement, refining the structure of messages by adding an additional processor identifier which gives the current location of the message, e.g.

\[ \text{msg}(P, Q, R, \tau, \mu) \]

with the meaning, “\(Q\) has sent a message with content \(\mu\) to \(R\); it is currently at node \(P\) and must be delivered at time \(\tau\).” Our coupling invariant (Cr5) states that a message exists if there is an analogous msg located somewhere between the source and destination processors.

\[ \text{Cr5:} \quad \text{inv. message}(Q, R, \tau, \mu) = (\exists P : \text{twixt}(Q, R) : \text{msg}(P, Q, R, \tau, \mu)) \]

Additionally, we require that there be at most one msg tuple for any message (Cr6), and that it be allocated to the processor simulating node \(P\) (Ar3):

\[ \text{Cr6:} \quad \text{inv. } (\Sigma P : \text{msg}(P, Q, R, \tau, \mu) : 1) \leq 1 \]
\[ \text{Ar3:} \quad \text{inv. locus(deliver}(P), i_1) \land \text{locus(msg}(P, Q, R, \tau, \mu), i_2) \Rightarrow i_1 = i_2 \]

We update the program to use the new representation. Note that the sending of a msg by task is performed by writing the tuple directly to the destination processor.
task(P,τ,α) ⇒
σ :
α ≠ ⊥, gclock(τ), ( ∀ Q,R,μ : ¬msg(Q,R,P,τ,μ), state(P,σ) )
→ task(P,e(P,σ,α),a(P,σ,α)), state(P,σ)↑, state(P,σ↑), ( Q : c(P,Q,σ,α) : msg(Q,P,Q)(P,Q,σ,α), ν(P,Q,σ,α) )
∥ NOR → task(P,τ,α)

deliver(P) ⇒
Q,τ,μ,σ :
gclock(τ), state(P,σ), msg(P,Q,P,τ,μ) → msg(P,Q,P,τ,μ)↑, state(P,σ)↑, state(P,u(P,σ,μ))
∥ TRUE → deliver(P)

gtick =
T,T' :
gclock(T),
( ∀ P,Q,R,τ,α,μ : msg(R,Q,P,τ,μ) ∨ (task(P,τ,α) ∧ α ≠ ⊥) :: τ > T ),
T+1 ≤ T' ≤ ( ⊥ min P,Q,R,τ,α,μ : msg(R,Q,P,τ,μ) ∨ (task(P,τ,α) ∧ α ≠ ⊥) :: τ )
→ gclock(T)↑, gclock(T)
∥ TRUE → gtick

Re-Verification Obligations. Obviously, the transactions satisfy the new architectural constraints (C₉5, C₉6). Since this is simply a data refinement, no new transitions are added to the program, so the program continues to satisfy the safety properties of the original specification. It is necessary to verify that the progress properties of the original program remain satisfied. In particular, we need to show that deliver continues to establish F18.2.1.

Proof Outline. The proof that deliver still satisfies F18.2.1 follows directly from the text of the program, by noting that the program maintains the following invariant, which says that all messages are located at the message’s destination processor:

L1: inv. msg(P,Q,R,τ,μ) ⇒ P = R

which guarantees that messages sent arrive at their destination. Further, deliver guarantees

gclock(τ) ∧ ( set Q,μ : msg(P,Q,P,τ,μ) :: μ ) = M ∧ M ≠ {} ∧ state(P,σ)
⇒ ( ∃ μ : ( set Q,μ' : msg(P,Q,P,τ,μ') :: μ') = M ∨ { μ } :: state(P,u(P,σ,μ)) )

By the definition of msg and L1, we have F18.2.1.

Outstanding Violations. Neither gtick nor gclock have been allocated. In addition, task creates the new messages at the target processor, which is a violation of C₉2.
Route messages around the ring. The write access violation by task can be resolved by adding a transaction type which routes messages around the ring, and then modifying task to create new messages locally. We use a distributed solution, allocating one routing transaction to each processor.

**Refinement.** We add a new transaction type router(P) which simply forwards messages which are not yet at their destination. Its behavior is formally described as:

\[
C_R7: \quad \text{Messages eventually reach their destination} \\
\quad \text{msg}(P,P,Q,P,\tau,\mu) \rightarrow \text{msg}(Q,P,Q,\tau,\mu)
\]

\[
C_R8: \quad \text{Messages remain at their destination until they are absorbed} \\
\quad \text{msg}(P,Q,P,P,\tau,\mu) \text{ unless } (\forall R :: \neg \text{msg}(R,Q,P,\tau,\mu))
\]

The router transaction actually implements these new properties incrementally, by moving the messages one processor at a time, clockwise around the ring.

\[
\text{router}(P) = \\
Q,R,\tau,\mu : \\
\quad \text{msg}(P,Q,R,\tau,\mu), P \neq R \rightarrow \text{msg}(P,Q,R,\tau,\mu) \uparrow, \text{msg}(\text{right}(P),Q,R,\tau,\mu) \\
\quad \text{TRUE} \rightarrow \text{router}(P)
\]

This transaction should be allocated to the same location as the analogous deliver transaction, and we must guarantee that the transaction will continue to exist whenever there is the possibility that any messages may arrive needing to be routed, e.g.,

\[
\text{AR4:} \quad \text{inv. locus(deliver(P),i) } \land \text{locus(router(P),i') } \Rightarrow i = i'
\]

\[
\text{CR9:} \quad \text{inv. } (\exists Q,\tau,\alpha,\mu :: (\text{action}(Q,\tau,\alpha) \land \alpha \neq \bot) \lor \text{message}(Q,P,\tau,\mu)) \Rightarrow \text{router}(P)
\]

**Re-Verification Obligations.** We must show that this new transaction does not violate any of the safety properties from the original specification. In particular, we can identify the properties affected by the transaction as F5, F9, F10, and F11. The proofs are straightforward. Additionally, we must show that router establishes C_R7, and that this in turn allows us to conclude that F18.2.1 is satisfied by the new program.

**Proof Outline.** To prove that router establishes C_R7, we can show that the sum of the distances between the current locations of all messages in the system and their destinations never increases, and in fact will decrease.

Informally, this means that eventually, all messages will arrive at their destinations. Formally, router establishes the following progress property:
L2: \[ \text{gclock}(\tau) \land (\Sigma P,Q,R,\tau,\mu : \text{msg}(P,Q,R,\tau,\mu) :: \text{dist}(P,R) ) = M \land M > 0 \]
\[ \implies \text{gclock}(\tau) \land (\Sigma P,Q,R,\tau,\mu : \text{msg}(P,Q,R,\tau,\mu) :: \text{dist}(P,R) ) < M \]

where \( \text{dist}(P,R) \) is the number of processors between \( P \) and \( R \) moving clockwise around the ring. Since by F11 messages are not created with a time stamp equal to the current time, the metric \( M \) cannot increase. Using L2, we can prove by induction the following progress property:

L3: \[ \text{gclock}(\tau) \implies (\Sigma P,Q,R,\tau,\mu : \text{msg}(P,Q,R,\tau,\mu) :: \text{dist}(P,R) ) = 0 \]

which states that eventually there are no messages with a current time stamp that are not at their destinations. Since by F20.1, the clock must eventually assume a value equal to the time stamp on any message, L3 gives us CR7.

Additionally, it is clear that \( \text{deliver} \) establishes the following variant of F18.2.1, which states that messages which have arrived at their destination will eventually be absorbed, while allowing for the arrival of new messages (via router):

L4: \[ \text{gclock}(\tau) \land (\text{set } Q,\mu : \text{msg}(P,Q,P,\tau,\mu) :: \mu ) = M \land M \neq \{ \} \land \text{state}(P,\sigma) \]
\[ \implies \text{gclock}(\tau) \land \]
\[ ((\exists \mu : (\text{set } Q,\mu' : \text{msg}(P,Q,P,\tau,\mu' ) :: \mu' ) = M \setminus \{ \mu \} :: \text{state}(P,\mu(P,\sigma,\mu)) ) \lor \]
\[ \text{state}(P,\sigma) \land M \subset (\text{set } Q,\mu' : \text{msg}(P,Q,P,\tau,\mu') :: \mu' )) \]

Since the number of messages is bounded above (by F11), eventually the set of undelivered messages with current time stamps will reach a maximum, from which time it can only decrease. CR7 guarantees that all messages eventually reach their destinations, and L4 requires that they eventually be delivered, which gives us F18.2.1.

**Outstanding Violations.** The allocation of \( \text{gStick} \) and \( \text{gclock} \) has not yet been resolved, and the violation of CR1 by the query for outstanding messages in task remains.

**Refine time increment into search and update phases.** As in the consensus bus example, it is clear that \( \text{gStick} \) and \( \text{gclock} \) must be distributed. However, the absence of either a global control mechanism or global memory requires that the approach be truly distributed. As a first refinement, we can break the work of \( \text{gStick} \) into two phases, a search phase which finds the earliest work, and an update phase which changes the current time. We find the following definitions useful at this point:

\[ \text{no_msg}(P,\tau) = (\forall P,Q,\mu : \neg \text{msg}(P,Q,R,\tau,\mu) ) \]
work(P,τ) = ( ∃ α,Q,R,μ :: action(P,τ,α) ∨ msg(P,Q,R,τ,μ) )

**Refinement.** The initial program refinement is simply to split glck into two global transactions which match the two phases, satisfying the following requirements:

CR10: **The search and update phases are mutually exclusive:**

    inv. ¬(gsearch ∧ ( ∃ T :: guupdate(T) ))

CR11: **The search and update phases continue at least until all work is completed**

    inv. work(P,T) ⇒ (gsearch ∨ ( ∃ T' : T ≤ T : guupdate(T') ))

CR12: **There is at most one time value being propagated**

    inv. ( Σ T : guupdate(T) :: 1 ) ≤ 1

CR13: **The search phase triggers an update phase identifying the time of the next work**

    gsearch ⇒ guupdate(( min P,T : work(P,T) :: T ))

CR14: **The update phase triggers a search phase**

    guupdate(T) until gsearch

CR15: **The update phase changes the simulation time**

    guupdate(T) until gclock(T)

The gsearch transaction is taken from property CR13, and guupdate from CR14 and CR15.

\[
gsearch = \\
τ : τ = ( min P,T : work(P,T) :: T ) → guupdate(τ)
\]

\[
gupdate(τ) = \\
T : gclock(T) → gclock(T)↑, gclock(τ), gsearch
\]

**Re-Verification Obligations.** We will need to prove that this pair of transactions satisfies the safety properties from the original specification. This is trivial for gsearch since it introduces no transitions within the original state (as a result, its characteristic property set is empty). Since guupdate performs the time change, it must be shown to satisfy the entire characteristic set for glck, namely, F1, F7, F10, F11, F14, F15, and F21. Additionally, we must prove that the new transactions satisfy F20.1, the progress property which motivated glck.

**Proof Outline.** The proof that the refinement continues to satisfy F20.1 involves simply proving that CR10-CR15 constitute a refinement of F20.1. As a review, F20.1 states:

\[
F20.1 \quad gclock(τ) ∧ T = ( min P,T' : work(P,T') :: T' ) ∧ T > τ\\n\]

\[
\rightarrow ( ∃ T' : τ+1 ≤ T' ≤ T : gclock(T') )
\]

45
Informally, \( C_{R11} \) requires that if there is work to be done, then the system is in a state which, by \( C_{R13} \) and \( C_{R14} \), will eventually cause the clock to be moved (\( C_{R15} \)). More formally, \( C_{R11} \) implies the following lemma, which describes the legal states of this sub-system at any given time:

\[
L5: \quad \text{inv. } T = \langle \min P, T' : \text{work}(P, T') :: T' \rangle \Rightarrow (gsearch \lor (gupdate(T') \land T' < T) \lor gupdate(T))
\]

The proof follows from the general disjunction rule for leads-to, as follows. \( L5 \) can be used to prove \( L6 \), which says that if \( T \) is the next time at which there is work, then the system will eventually set the clock to time \( T \), i.e.,

\[
L6: \quad T = \langle \min P, T : \text{work}(P, T) :: T \rangle \rightarrow \text{gclock}(T)
\]

\( L6 \) follows from \( L5, C_{R13}, C_{R14}, C_{R15} \), and the transitivity of leads-to. Since \( L6 \) implies \( F_{20.1} \), the refinement is proven.

**Outstanding Violations.** We have not yet allocated either of the new transactions \( \text{glock} \). The read violation by \( \text{task} \) remains.

**Distribute clock.** As a first step towards allocating \( gsearch, gupdate, \) and \( \text{glock} \), we distribute the clock, giving each processor a local copy which will be updated during the \( \text{update} \) phase. This refinement will allow us to prove that several of the transactions satisfy the access constraints, which has not been possible until now.

**Refinement.** In the consensus bus example, we were able to maintain an invariant which basically required that all local clocks have the same value. Since there is no global control mechanism available in this architecture, we elect to require a slightly weaker coupling invariant. Naturally, if a solution were to present itself which enabled us to maintain all the clocks in synchrony, it would satisfy the coupling invariant. Specifically, we will define the global time to be the minimum local time value, with the added restriction that there can only be at most 2 different local time values.

\[
C_{R16}: \quad \text{inv. } \text{gclock}(T) = T = \langle \min P, T' : \text{clock}(P, T') :: T' \rangle
\]

\[
C_{R17}: \quad \text{inv. } 1 \leq \langle \sum T : (\exists P :: \text{clock}(P, T)) :: 1 \rangle \leq 2
\]

The clock tuples are allocated to the same processor as the task transaction for the same node:

\[
A_{R5}: \quad \text{inv. locus} (\text{clock}(P, T), i_1) \land \text{locus} (\text{task}(P, \tau, \alpha), i_2) \Rightarrow i_1 = i_2
\]

46
We re-write the transactions to use the new representation (router is not affected):

\[ \text{task}(P, \tau, \alpha) \equiv \]
\[ \sigma : \]
\[ \alpha \neq \bot, \text{clock}(P, \tau), (\forall Q, R, \mu :: \neg \text{msg}(R, Q, P, \tau, \mu)) , \text{state}(P, \sigma) \]
\[ \rightarrow \]
\[ \text{task}(P, a(P, \sigma, \alpha), a(P, \sigma, \alpha)), \]
\[ \text{state}(P, \sigma) \uparrow, \text{state}(P, s(P, \sigma, \alpha)), \]
\[ (Q : c(P, Q, \sigma, \alpha) :: \text{msg}(P, P, Q, l(P, Q, \tau, \sigma, \alpha), v(P, Q, \sigma, \alpha)) ) \]
\[ \| \text{NOR} \rightarrow \text{task}(P, \tau, \alpha) \]

\[ \text{deliver}(P) \equiv \]
\[ Q, \tau, \mu, \sigma : \]
\[ \text{clock}(P, \tau), \text{state}(P, \sigma), \text{msg}(P, Q, P, \tau, \mu) \rightarrow \text{msg}(P, Q, P, \tau, \mu) \uparrow, \text{state}(P, \sigma) \uparrow, \text{state}(P, u(P, \sigma, \mu)) \]
\[ \| \text{TRUE} \rightarrow \text{deliver}(P) \]

\[ \text{gsearch} \equiv \]
\[ \tau : \tau = (\min P, T : \text{work}(P, T) :: T) \rightarrow \text{gupdate}(\tau) \]

\[ \text{gupdate}(\tau) \equiv \]
\[ \text{clock}(P, T) \rightarrow (P :: \text{clock}(P, T) \uparrow), (P :: \text{clock}(P, \tau) \uparrow), \text{gsearch} \]

**Proof Obligations.** Since the refinement doesn't change the semantics of any of the transactions, it is not necessary to re-verify adherence to the original specification. This refinement allows us to prove that `deliver` only accesses local data (CR1-CR2), and so it satisfies the entire architectural specification.

**Outstanding Violations.** The allocation of `gsearch` and `gupdate` remains to be decided, and the read violation by `task` is still around.

**Distribute `gupdate`.** We now wish to distribute the update process to eliminate its violations of the specification. The obvious solution is to distribute the transaction, and pass control around the ring, allowing each processor to update its clock locally.

**Refinement.** We replace the `gupdate` transaction type with `update`. This transaction moves around the ring, updating the clock at each processor. We consider that the system is in the update phase wherever there is an `update` transaction at any site:

\[ \text{CR 18: } \text{gupdate}(\tau) = (\exists P :: \text{update}(P, \tau) \} \]

Additional requirements describe the behavior of the new transaction:
There is at most one update transaction at a time
\( \sum P, T : \text{update}(P, T) :: 1 \) \leq 1

The transition from the search phase to the update phase takes place at processor 0
\( gsearch \) unless \( \exists T :: \text{update}(0, T) \)

The update transaction moves clockwise around the ring
\( \text{update}(P, T) \land P < N-1 \) until \( \text{update}(\text{right}(P), T) \)

When the update transaction reaches processor N-1, the search phase begins
\( \text{update}(N-1, T) \) unless \( gsearch \)

Additionally, we allocate the \textit{update} transaction to the same processor as the local clock:
\( \text{inv. locus(\text{update}(P, T), i_1) \land locus(\text{clock}(P, T), i_2) \Rightarrow i_1 = i_2} \)

We can now write the \textit{update} transaction definition. The first subtransaction is simply a distributed version of the subtransaction from \textit{gupdate}; the second subtransaction is from \textit{Cr21}; and the third is from \textit{Cr22} and \textit{Cr14}. This transaction now satisfies the entire specification.

\[
\text{update}(P, \tau) = \\
\text{clock}(P, T) \rightarrow \text{clock}(P, T) \downarrow, \text{clock}(P, \tau) \\
\quad \text{II } P < N-1 \rightarrow \text{update}(\text{right}(P), \tau) \\
\quad \text{II } P = N-1 \rightarrow \text{gsearch}
\]

We also re-write the \textit{gsearch} transaction to satisfy \textit{Cr20} and \textit{Cr13}:

\[
gsearch = \\
\tau : \tau = (\min P, T : \text{work}(P, T) :: T) \rightarrow \text{update}(0, \tau)
\]

**Proof Obligations.** We must show that the collection of \textit{update} transactions satisfies the progress properties (\textit{Cr14} and \textit{Cr15}) that were originally satisfied by \textit{gupdate}, and that there is no violation of the characteristic safety properties for \textit{gupdate}. The former follows directly from the transitivity of leads-to, and the latter is clear from examining the text of the transaction. Additionally, we can show that \textit{update} performs only local reads and right writes, satisfying \textit{Cr1} and \textit{Cr2}.

**Outstanding Violations.** \textit{gsearch} has not yet been allocated. The only access violations left to resolve are those of \textit{Cr1} by \textit{gsearch} and \textit{task}.

**Distribute search.** Our solution is to distribute the search process in such a way as to guarantee that messages are delivered to their destinations before the local clock is updated. This will allow us to replace the global
message query in task with a local query, eliminating the access violation. We propose the creation of a search transaction which is passed from processor to processor. When received by a processor, the time stamp carried on the transaction should be compared against the earliest work which the processor knows about, and if the processor knows of earlier work to be done, it changes the time stamp before passing the transaction on. If the search process makes 2 passes around the ring, then we can guarantee that the delivery times for all messages are considered.

**Refinement.** We propose the following form for search:

\[
\text{search}(P,n,T)
\]

having the meaning: the search process is at processor \( P \), and at this point, the earliest work found is at time \( T \). The transaction makes 2 passes around the ring; \( n \) contains the pass number. As with the refinement to gupdate, we allow at most one search transaction to exist at any given time:

\[
\text{Cr23: } \sum P,n,T : \text{search}(P,n,T) :: 1 \leq 1
\]

We introduce the obvious coupling invariant to map gsearch to search:

\[
\text{Cr24: } \text{inv. gsearch } \equiv \exists P,n,T : \text{search}(P,n,T)
\]

and we do not allow the time stamp on the search transaction to increase:

\[
\text{Cr25: } \text{search}(P,n,T) \text{ unless } \exists P',n',T' : \text{search}(P',n',T') :: T' \leq T
\]

The allocation is the same one used for all entities in this process:

\[
\text{Ar7: } \text{locus(clock}(P,t)_{i_1}) \land \text{locus(search}(P,n,T)_{i_2}) \Rightarrow i_1 = i_2
\]

To guarantee that messages are delivered before the local clock is updated, we will want the search transaction to "push" messages around the ring. That is, messages should be forwarded before the transaction advances. This idea is captured formally in the following invariant, which states that on the second pass, all unreceived messages located at nodes "ahead" of the search transaction are in fact destined for processors "ahead" of the transaction:

\[
\text{Cr26: } \text{inv. search}(P,2,t) \Rightarrow (\text{msg}(Q,R,S,t,\mu) \land Q \text{twixt}(P,N-1) \Rightarrow S \text{twixt}(P,N-1))
\]

Since this invariant must hold as soon as the second pass begins, it in fact constrains the first pass, requiring that it be performed in a manner guaranteed to establish \( \text{Cr26} \). The main significance of \( \text{Cr26} \) is that it guarantees that the
second pass will encounter every message. We introduce the function \( \text{min} \_ \text{work}(P) \), which simply computes the minimum time value for which work is known at processor \( P \), i.e.,

\[
\text{min} \_ \text{work}(P) \equiv ( \min T : \text{work}(P,T) :: T )
\]

Additionally, we require the transaction to move clockwise around the ring, updating its time stamp to reflect the minimum time for which work has been encountered. The actual properties differ slightly depending on where the transaction currently resides:

\[\text{C}_R27: \ \text{During either pass, if the transaction is not at the end of the ring, it moves clockwise, changing the time if appropriate}
\]
\[
\text{search}(P,n,T) \land P \neq N - 1 \ \text{until search(right}(P),n,\text{min}(T,\text{min} \_ \text{work}(P)))
\]

\[\text{C}_R28: \ \text{At the beginning of the ring, pass 1 becomes pass 2}
\]
\[
\text{search}(N - 1,1,T) \ \text{until search}(0,2,\text{min}(T,\text{min} \_ \text{work}(N - 1)))
\]

\[\text{C}_R29: \ \text{At the beginning of the ring, pass 2 becomes the update phase}
\]
\[
\text{search}(N - 1,2,T) \ \text{until update}(0,\text{min}(T,\text{min} \_ \text{work}(N - 1)))
\]

Finally, we refine \( \text{C}_R22 \) to reflect the new notation:

\[\text{C}_R22.1 \ \text{update}(N - 1,T) \ \text{unless search}(0,1,\infty)\]

This gives us the following form for the \text{search} transaction:

\[
\text{search}(P,n,T) =
\]
\[
Q,R,\tau,\mu:
\]
\[
\text{msg}(P,Q,R,\tau,\mu), R \neq P, \tau \leq T \rightarrow \text{search}(P,n,T)
\]
\[
\text{NOR}, P \neq N - 1 \rightarrow \text{search(right}(P),n,\text{min}(T,\text{min} \_ \text{work}(P)))
\]
\[
\text{NOR}, P = N - 1, n = 1 \rightarrow \text{search}(0,2,\text{min}(T,\text{min} \_ \text{work}(P)))
\]
\[
\text{NOR}, P = N - 1, n = 2 \rightarrow \text{update}(0,\text{min}(T,\text{min} \_ \text{work}(P)))
\]

The first subtransaction serves to maintain \( \text{C}_R26 \) by making the \text{NOR} of the other three subtransactions \text{false} as long as there are still messages to be forwarded (by \text{router}); the second, third, and fourth subtransactions are from \( \text{C}_R27, \text{C}_R28, \) and \( \text{C}_R29 \), respectively. We must also revise \text{update} to reflect \( \text{C}_R22.1 \).

\[
\text{update}(P,\tau) =
\]
\[
\text{clock}(P,T) \rightarrow \text{clock}(P,T)\uparrow, \text{clock}(P,\tau)
\]
\[
\text{P < N - 1} \rightarrow \text{update(right}(P),\tau)
\]
\[
\text{P = N - 1} \rightarrow \text{search}(0,1,\infty)
\]

Finally, \( \text{C}_R26 \) implies that when the \text{update} transaction arrives at a node there are no messages for the transaction that are not already at the processor, that is, the following invariant is maintained by the program:
\[
\text{Cr30: } \text{inv. clock}(P, \tau) \land \text{message}(Q,R,P,\tau,\mu) \Rightarrow P = Q
\]

This allows us to modify \textit{task} to check for messages locally, eliminating its read violation as well:

\[
\begin{align*}
\text{task}(P, \tau, \alpha) &= \\
\sigma : &\alpha \neq \bot, \text{clock}(P, \tau), \langle \forall Q, \mu : \neg \text{msg}(P, Q, P, \tau, \mu) \rangle, \text{state}(P, \sigma) \\
\rightarrow &\text{task}(P, e(P, \sigma, \alpha), a(P, \sigma, \alpha)), \\
&\text{state}(P, \sigma) \uparrow, \text{state}(P, s(P, \sigma, \alpha)), \\
&\langle Q : c(P, Q, \sigma, \alpha) : \text{msg}(P, Q, P, Q, \tau, \sigma, \alpha), v(P, Q, \sigma, \alpha) \rangle \\
\| &\text{NOR} \rightarrow \text{task}(P, \tau, \alpha)
\end{align*}
\]

\textbf{Proof Obligations.} We will need to show that the \textit{search} and \textit{update} transactions, as modified, continue to satisfy the constraints which initially motivated their form, e.g., \textit{Cr10-Cr15}, and \textit{Cr19-Cr22}. Additionally, we can now show that \textit{search} does not violate the read access restrictions (\textit{Cr1}), so the program now satisfies the entire architectural specification.

\textbf{Proof Outline.} The proof that the \textit{update} transaction is correct is straightforward. Similarly, the invariance of \textit{Cr30} is obvious from the program text. More interesting is the proof that \textit{search} satisfies the previous specification, in particular the progress properties \textit{Cr13-Cr15}. As with the refinement of \textit{gupdate} in the previous section, the approach is to show that \textit{Cr25-Cr29} amount to a refinement of \textit{Cr13}, and that \textit{search} satisfies the refined properties.

The proof amounts to showing that \textit{Cr26} is adequate to guarantee that the search process encounters every message before the end of the second pass. That is, if we can prove the following invariant, then the refinement is correct:

\[
\text{LS: } \text{inv. search}(P, 2, \tau) \Rightarrow (\forall Q : 0 \leq Q < P : \tau \leq \text{min_work}(Q))
\]

Taking \textit{Cr26} with \textit{P = S} (that is, the search process has reached the destination of some message), allows us to infer \textit{Q = S}, that is, the message has reached its destination, since we have \textit{Q twixt}(P, N-1) and \textit{Q twixt}(R, P). From this, we can conclude (from the definition of \textit{min_work})

\[
\text{search}(P, 2, \tau) \Rightarrow \text{min_work}(P) = (\text{min } Q, R, \tau', \alpha, \mu : \text{action}(P, \tau', \alpha) \land \alpha \neq \bot \lor \text{message}(R, Q, P, \tau', \mu) : \tau')
\]
That is, when the search transaction arrives at node $P$, then the computation of \textit{min\_work} will include all messages destined for that node. Thus, each of the \textit{min} computations in $C_{R27}$-$C_{R29}$ is guaranteed to compute the minimum for all nodes up to $P$, which gives us L5. $C_{R13}$ follows immediately from L5 and the transitivity of leads-to.

Now that all of the architectural constraints have been satisfied for all elements of the program, the process is complete. It should be noted that there is likely a one-pass algorithm for \textit{search}, but the architectural constraints do not force us to discover it, and thus it is outside the scope of this paper.

7. Discussion

In this paper we have shown that architectural constraints can be formalized as assertions over programs and that violations of such assertions by an otherwise correct program can guide a program refinement process leading to a final solution which satisfies both the functional specifications and the architectural constraints. As in the case of specification refinement, program refinement is a creative process not likely to succumb easily to attempts at mechanization. Nevertheless, our experience demonstrates the existence of several broad classes of program refinements which, far from being mechanical in nature, entail only minimal re-verification of the program. This is a significant result since the re-verification of each new program could, in general, be a very time-consuming activity.

The first class of refinements could be termed \textit{straight data refinement}. One example is provided by the distribution of the \textit{geclock} tuple in the consensus bus architecture. This refinement has two characteristics which allow us to forgo most re-verification in the resulting program. First, the coupling invariant that ties the old and new data representations defines an equivalence relation and, thus, eliminates any need to re-write the specification to use the new representation. Second, the state transitions taking place in the old and new programs are also in a one-to-one relation. The resulting program thus satisfies the same safety and progress properties as the original one.

The second class of refinements are the \textit{synchronous process distribution} refinements such as the distribution of \textit{gtick} in the consensus bus example. No re-verification was required for this refinement because the semantics of the resulting synchronous group are identical to that of the original transaction. This refinement allowed us to convert one global transaction into a set of transactions which perform local actions but are synchronized so as to accomplish the same total effect. The resulting structure has great potential for parallel execution in a distributed ar-
architecture which provides the needed synchronization mechanisms. The refinement as performed in the example was aided by the fact that both the original query and actions were easily partitioned among the available processors.

A third class of refinements is the *asynchronous process distribution*. The distribution of the *gdeliver* transaction (in both examples) is illustrative of this class. One global transaction was replaced by a number of asynchronous local transactions. This was possible because the processing performed by the original transaction was itself "asynchronous" in the sense that each time the original was executed, the changes it made to the global system state were in fact local and non-interfering.

The fourth class of refinements is the *serialization*, as illustrated by the final refinement in the consensus bus architecture which combines two non-interfering transactions (*task* and *deliver*) into a single transaction. This refinement is interesting because it reflects a transformation which is likely to be useful in the real world, where multiple activities may need to be grouped, either for architectural reasons, as in the example, or for other practical reasons, such as required access to shared data. The transformation described above takes advantage of the fact that the two transactions perform actions which cannot interfere with one another—*task* can only execute when there are no messages, and *deliver* has nothing to do when all messages have been delivered. Had the two transactions interfered in some way, it would have been necessary to refine one to eliminate the interference before they could be combined.

Although we made no attempt to catalog the full range of program refinements we have encountered so far in our work, it seems reasonable that a broad repertoire of useful transformations could be developed and characterized formally, along the lines of the work in the Action Systems [2] model. In this paper an attempt is made to classify program transformations by some characteristics manifest by the program to be refined. In general, their approach is to manipulate a program to reduce the interference of its statements, and then to use a "stock" refinement to distribute the computation. In effect, they move the creative part of the refinement process into the problem of removing the interference between statements; this is very similar to our approach in this paper.

Two refinements from the ring example are interesting, not so much because they can be neatly classified, but rather because they look more like specification refinements than program refinements. The refinement of *gtick*
into two phases, and the resulting refinements of the gsearch and gupdate transactions, while relatively complex, were easy to prove precisely because we were able to prove that the properties we introduced to describe the behavior of the new transactions were in fact a refinement of the original specification. This leads us to wonder if it is necessary to do program refinement at all. That is, is it possible to reason about the influence of the architectural constraints on a program without actually writing a program? This idea is attractive to us because the techniques for manipulating specifications are much better understood than those for manipulating programs, and we would like to stay within the specification realm for as long as possible. One possible approach might be to write programs simply to detect violations of the architectural constraints, but then to use the insight gained from the program to motivate the next refinement of the specification—this may turn out to be uneconomical and also to lead us back to the ad-hoc factoring of architectural constraints which characterizes specification refinements today. A more attractive alternative is to take advantage of the ability present in the Swarm logic to reason about both data (data tuples) and actions (transactions)—the specifications shown in this paper do not really draw any distinction between deriving a UNITY or a Swarm program—and to augment the specifications with assertions regarding data and action allocation to processors.

8. Conclusions

In this paper, we have shown that architectural constraints can be expressed using assertions about programs in the style of UNITY and Swarm logics—the same notation we use to write formal behavior specifications. By unifying the notations, we are able to directly factor the architectural constraints into the program derivation process. Our current approach requires a program be generated through specification refinement before the architectural constraints can be considered—the architectural constraints involve assertions over auxiliary tuples (variables) whose introduction requires the presence of a program. To the best of our knowledge, this is the first time that architectural constraints have been specified formally and compliance of the program to the architecture for which it is intended has been verified formally.

References


55


---

1 This is an example of a constructor, a syntactic element which occurs frequently in our notation. The general form of the constructor is:

\[
\langle \text{op} \; \text{dummy variables} : \text{range constraint} :: \text{expression} \rangle
\]

where \( \text{op} \) is typically a binary, associative, and commutative operator (such as \(+, \times, \wedge, \vee\), written \(\Sigma, \Pi, \forall, \exists\), respectively). Logically, the constructor creates a multi-set of values \(\{v_1, v_2, \ldots, v_n\}\) by evaluating the expression for every possible instantiation of the dummy variables satisfying the range constraint. The final value of the constructor is obtained by evaluating the expression \(v_1 \text{ op } v_2 \text{ op } \ldots \text{ op } v_n\). If the range is empty the zero-element for the operator is returned. Other frequently used operators are \(\min, \max, \text{ and } \set\), having the obvious interpretations. We occasionally omit the range of the constructor (and the first colon) when the range is obvious from the context.

2 The special predicates are \textbf{AND, NAND, OR, NOR, and TRUE}, meaning "all," "not-all," "some," "none," and "no-matter-how-many" of the regular queries succeed.

3 The same expression, appearing in the query of a subtransaction, allows the program to query for the existence of a synchrony relation entry. Synchrony relation entries can also be deleted using the \texttt{rigger}. Furthermore, it is possible to query (but not modify) the transitive closure of the synchrony relation using "\(\equiv\)" in place of "\(-\)."

4 Throughout the architectural refinements, we will use transaction names as predicates. Informally, if \(T\) is a transaction type name, then the predicate \(T\) means that there is a transaction (or collection of transactions) which satisfy the progress properties satisfied by the transaction \(T\). This notation allows us to treat transaction refinement as a form of data refinement, a subject to which we will return in the discussion.
Index terms: stepwise refinement, formal methods, concurrency, specification, distributed simulation, architecture
Figure 1: Notation used in the Swarm proof logic.
Figure 2: One site of the postulated architecture.