lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

commit 8dec4d579319f5417a2f7ced25711cf3eaf52959
parent 69f268ec1a4811475321127b5fa045bc0fd88ec3
Author: Alex Balgavy <alex@balgavy.eu>
Date:   Wed, 21 Apr 2021 12:52:30 +0200

Distributed algs notes

Diffstat:
Mcontent/distributed-algorithms-notes/_index.md | 1+
Acontent/distributed-algorithms-notes/fault-tolerance.md | 100+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+), 0 deletions(-)

diff --git a/content/distributed-algorithms-notes/_index.md b/content/distributed-algorithms-notes/_index.md @@ -9,3 +9,4 @@ title = 'Distributed Algorithms' 4. [Routing](routing) 5. [Election algorithms](election-algorithms) 6. [Anonymous networks](anonymous-networks) +7. [Fault tolerance](fault-tolerance) diff --git a/content/distributed-algorithms-notes/fault-tolerance.md b/content/distributed-algorithms-notes/fault-tolerance.md @@ -0,0 +1,100 @@ ++++ +title = 'Fault tolerance' ++++ +# Fault tolerance +A process may crash -- unexpectedly stop executing events. + +Assume network is complete -- there's a bidirectional channel between any two processes. +So process crashes never make remaining network disconnected. +Assume crashing of processes can't be observed. + +## Consensus +Binary consensus +- initially all processes randomly select 0 or 1 +- eventually all alive processes uniformly decide 0 or 1 + +Assumptions: +- validity: if all processes randomly select same initial value b, then all alive processes decide b +- k-crash consensus: at most k processes may crash + +No algorithm for 1-crash consensus always terminates. + +b-potent set S of processes: if by only executing events at processes in S, some process in S can decide b + +No Las Vegas algorithm for k ≥ N/2 consensus + +### Bracha-Toueg crash consensus algorithm +Let k < N/2. +- initially each alive process randomly selects 0 or 1, with weight 1 +- in round n, at each alive undecided process p: + - p sends [n, value(p), weight(p)] to all processes including itself + - p waits until N-k messages [n, b, w] have arrived (dismisses/stores messages from earlier/future rounds) + - if w > N/2, then value(p) = b + - else, value(p) = 0 if most messages voted 0, value(p) = 1 otherwise + - weight(p) = number of incoming votes for value(p) in round n + - if w > N/2 for more than k incoming messages, then p decides b +- if p decides b, it broadcasts [n+1, b, N-k] and [n+2, b, n-k] and terminates + +Let k < N/2. +This is a Las Vegas algorithm that terminates with probability 1. + +## Failure detection +Failure detector at process tracks which process have (or may have) crashed. +Given an upper bound on network latency and heartbeat messages, one can implement a failure detector. +For this setting, terminating crash consensus algorithms exist. + +Assume time domain with total order. +- F(t) is set of crashed processes at time t ("failure pattern") +- t1 < t2 ⇒ F(t1) ⊆ F(t2) +- Assume processes can't observe F(t). + +H(p, t) is set of processes that p suspects to be crashed at time t ("failure detector history") + +Require that failure detectors are complete: from some time onward, each crashed process is suspected by each alive process. + +"strongly accurate" failure detector: if only crashed processes are ever suspected +- Assumptions: + - each alive process broadcasts `alive` every v time units + - dmax is a known upper bound on network latency + +- Process from which no message is received for v + dmax time units has crashed + +"weakly accurate": if some process is never suspected by any process +- Processes numbered p0..p(N-1) +- Initially, each process randomly selects 0 or 1. +- In round n: + - pn (if alive) broadcasts its value + - each process waits + - for incoming message from pn, in which case it adopts value of pn + - or until it suspects that pn has crashed +- After round N-1, each alive process decides for its value. + +"eventually strongly accurate": if from some time onward, only crashed processes are suspected +- assumptions: + - each alive process broadcasts `alive` every v time units + - there is unknown upper bound on network latency +- each process q initially guesses as network latency dq = 1 +- if q receives no message from p for v+dq time units, then q suspects that p has crashed +- if q receives a message from a suspected process p, then p is no longer suspected, and dq = dq+1 +- let k ≥ N/2, there is no Las Vegas algorithm for k-crash consensus based on eventually strongly accurate failure detector + +"eventually weakly accurate": if from some time onward, some process is never suspected +- let k < N/2, eventually weakly accurate failure detector used for k-crash consensus +- Chandra-Toueg k-crash consensus algorithm + - each process q records last round lu(q) in which it updated value(q) + - initially value(q) ∈ {0,1} and lu(q) = -1 + - processes numbered p0..p(N-1) + - round n coordinated by pc with c = n mod N + - in round n, each alive process q sends [vote, n, value(q), lu(q)] to pc + - pc (if alive) waits until N-k such messages arrived, and selects one [vote, n, b, l] with l as large as possible + - value(pc) = b, lu(pc) = n + - pc broadcasts [value, n, b] + - each alive process q waits + - either until [value, n, b] arrives + - then value(q) = b, lu(q) = n + - and q sends [ack, n] to pc + - or until it suspects pc crashed + - then q sends [nack, n] to pc + - if pc receives more than k messages [ack, n], then pc decides b and broadcasts [decide, b] + - an undecided process that receives [decide, b] decides b +- let k < N/2, the Chandra-Toueg algorithm is an always terminating k-crash consensus algorithm