lecture-1.md (2388B)
1 +++ 2 title = 'Lecture 1' 3 template = 'page-math.html' 4 +++ 5 6 Programming shared address space concurrent systems: 7 - App-facing high-level approach: almost ordinary sequential program, add compiler directives to parallelise 8 - structured parallelism, limited choice 9 - System-facing: explicitly multithreaded 10 - SIMD parallelism and vectorisation: single instruction, multiple data 11 - extra long (vector) registers 12 - store multiple values at once, apply same instruction to all values in vector 13 - can be automatically vectorised by compiler, or use compiler intrinsics manually 14 15 Concurrency: tasks A and B are concurrent iff they can be performed independently of each other, yielding identical results. 16 17 Parallelisation patterns: 18 - decomposition: 19 1. partition problem into concurrent subproblems 20 2. Solve each sub-problem independently 21 3. Combine partial solutions into solution of initial problem 22 - functional decomposition: 23 1. Partition problem into mostly independent functional units 24 2. Each functional unit has own implementation 25 3. functional units communicate with each other 26 - pipelining 27 - sequence of independent tasks 28 - task receives input data and produces output (fifo buffers) 29 - tasks are connected via streams (fifo buffers) 30 - domain (data structure) decomposition 31 - problem dominated by manipulation of large data structure 32 - decompose the data structure 33 - let each process manipulate its own part of data 34 - recombine data structure 35 - divide and conquer 36 - task solved sequentially or recursively split into subtasks 37 - fan-out defined by algorithm 38 - subtasks computed independently 39 - task waits for subtasks to finish 40 - task combines partial result 41 - only parent-child communication 42 43 Organisation principle: parallel supersteps 44 - spatial split into number of subtasks 45 - fan-out is runtime constant 46 - temporal split into supersteps 47 - subtasks independent within each step 48 - barrier sync between steps 49 - communication only between steps 50 51 How measure parallel performance? 52 - speedup: $S_{n} = \frac{T_{1}}{T_{n}}$ 53 - T₁: exec time using 1 core 54 - $T_n$: exec time using n cores 55 56 Parallel program dev process: 57 1. formulate problem 58 2. write symbolic math specification 59 3. write sequential code 60 4. parallel code 61 62 Parallelisation does not affect asymptotic complexity, only adds a constant factor