commit 8dec4d579319f5417a2f7ced25711cf3eaf52959
parent 69f268ec1a4811475321127b5fa045bc0fd88ec3
Author: Alex Balgavy <alex@balgavy.eu>
Date:   Wed, 21 Apr 2021 12:52:30 +0200
Distributed algs notes
Diffstat:
2 files changed, 101 insertions(+), 0 deletions(-)
diff --git a/content/distributed-algorithms-notes/_index.md b/content/distributed-algorithms-notes/_index.md
@@ -9,3 +9,4 @@ title = 'Distributed Algorithms'
 4. [Routing](routing)
 5. [Election algorithms](election-algorithms)
 6. [Anonymous networks](anonymous-networks)
+7. [Fault tolerance](fault-tolerance)
diff --git a/content/distributed-algorithms-notes/fault-tolerance.md b/content/distributed-algorithms-notes/fault-tolerance.md
@@ -0,0 +1,100 @@
++++
+title = 'Fault tolerance'
++++
+# Fault tolerance
+A process may crash -- unexpectedly stop executing events.
+
+Assume network is complete -- there's a bidirectional channel between any two processes.
+So process crashes never make remaining network disconnected.
+Assume crashing of processes can't be observed.
+
+## Consensus
+Binary consensus
+- initially all processes randomly select 0 or 1
+- eventually all alive processes uniformly decide 0 or 1
+
+Assumptions:
+- validity: if all processes randomly select same initial value b, then all alive processes decide b
+- k-crash consensus: at most k processes may crash
+
+No algorithm for 1-crash consensus always terminates.
+
+b-potent set S of processes: if by only executing events at processes in S, some process in S can decide b
+
+No Las Vegas algorithm for k ≥ N/2 consensus
+
+### Bracha-Toueg crash consensus algorithm
+Let k < N/2.
+- initially each alive process randomly selects 0 or 1, with weight 1
+- in round n, at each alive undecided process p:
+    - p sends [n, value(p), weight(p)] to all processes including itself
+    - p waits until N-k messages [n, b, w] have arrived (dismisses/stores messages from earlier/future rounds)
+        - if w > N/2, then value(p) = b
+        - else, value(p) = 0 if most messages voted 0, value(p) = 1 otherwise
+        - weight(p) = number of incoming votes for value(p) in round n
+    - if w > N/2 for more than k incoming messages, then p decides b
+- if p decides b, it broadcasts [n+1, b, N-k] and [n+2, b, n-k] and terminates
+
+Let k < N/2.
+This is a Las Vegas algorithm that terminates with probability 1.
+
+## Failure detection
+Failure detector at process tracks which process have (or may have) crashed.
+Given an upper bound on network latency and heartbeat messages, one can implement a failure detector.
+For this setting, terminating crash consensus algorithms exist.
+
+Assume time domain with total order.
+- F(t) is set of crashed processes at time t ("failure pattern")
+- t1 < t2 ⇒ F(t1) ⊆ F(t2)
+- Assume processes can't observe F(t).
+
+H(p, t) is set of processes that p suspects to be crashed at time t ("failure detector history")
+
+Require that failure detectors are complete: from some time onward, each crashed process is suspected by each alive process.
+
+"strongly accurate" failure detector: if only crashed processes are ever suspected
+- Assumptions:
+    - each alive process broadcasts `alive` every v time units
+    - dmax is a known upper bound on network latency
+
+- Process from which no message is received for v + dmax time units has crashed
+
+"weakly accurate": if some process is never suspected by any process
+- Processes numbered p0..p(N-1)
+- Initially, each process randomly selects 0 or 1.
+- In round n:
+    - pn (if alive) broadcasts its value
+    - each process waits
+        - for incoming message from pn, in which case it adopts value of pn
+        - or until it suspects that pn has crashed
+- After round N-1, each alive process decides for its value.
+
+"eventually strongly accurate": if from some time onward, only crashed processes are suspected
+- assumptions:
+    - each alive process broadcasts `alive` every v time units
+    - there is unknown upper bound on network latency
+- each process q initially guesses as network latency dq = 1
+- if q receives no message from p for v+dq time units, then q suspects that p has crashed
+- if q receives a message from a suspected process p, then p is no longer suspected, and dq = dq+1
+- let k ≥ N/2, there is no Las Vegas algorithm for k-crash consensus based on eventually strongly accurate failure detector
+
+"eventually weakly accurate": if from some time onward, some process is never suspected
+- let k < N/2, eventually weakly accurate failure detector used for k-crash consensus
+- Chandra-Toueg k-crash consensus algorithm
+    - each process q records last round lu(q) in which it updated value(q)
+    - initially value(q) ∈ {0,1} and lu(q) = -1
+    - processes numbered p0..p(N-1)
+    - round n coordinated by pc with c = n mod N
+    - in round n, each alive process q sends [vote, n, value(q), lu(q)] to pc
+    - pc (if alive) waits until N-k such messages arrived, and selects one [vote, n, b, l] with l as large as possible
+        - value(pc) = b, lu(pc) = n
+        - pc broadcasts [value, n, b]
+    - each alive process q waits
+        - either until [value, n, b] arrives
+            - then value(q) = b, lu(q) = n
+            - and q sends [ack, n] to pc
+        - or until it suspects pc crashed
+            - then q sends [nack, n] to pc
+    - if pc receives more than k messages [ack, n], then pc decides b and broadcasts [decide, b]
+    - an undecided process that receives [decide, b] decides b
+- let k < N/2, the Chandra-Toueg algorithm is an always terminating k-crash consensus algorithm