lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

lecture-1.md (2388B)


      1 +++
      2 title = 'Lecture 1'
      3 template = 'page-math.html'
      4 +++
      5 
      6 Programming shared address space concurrent systems:
      7 - App-facing high-level approach: almost ordinary sequential program, add compiler directives to parallelise
      8     - structured parallelism, limited choice
      9 - System-facing: explicitly multithreaded
     10 - SIMD parallelism and vectorisation: single instruction, multiple data
     11     - extra long (vector) registers
     12     - store multiple values at once, apply same instruction to all values in vector
     13     - can be automatically vectorised by compiler, or use compiler intrinsics manually
     14 
     15 Concurrency: tasks A and B are concurrent iff they can be performed independently of each other, yielding identical results.
     16 
     17 Parallelisation patterns:
     18 - decomposition:
     19     1. partition problem into concurrent subproblems
     20     2. Solve each sub-problem independently
     21     3. Combine partial solutions into solution of initial problem
     22 - functional decomposition:
     23     1. Partition problem into mostly independent functional units
     24     2. Each functional unit has own implementation
     25     3. functional units communicate with each other
     26 - pipelining
     27     - sequence of independent tasks
     28     - task receives input data and produces output (fifo buffers)
     29     - tasks are connected via streams (fifo buffers)
     30 - domain (data structure) decomposition
     31     - problem dominated by manipulation of large data structure
     32     - decompose the data structure
     33     - let each process manipulate its own part of data
     34     - recombine data structure
     35 - divide and conquer
     36     - task solved sequentially or recursively split into subtasks
     37     - fan-out defined by algorithm
     38     - subtasks computed independently
     39     - task waits for subtasks to finish
     40     - task combines partial result
     41     - only parent-child communication
     42 
     43 Organisation principle: parallel supersteps
     44 - spatial split into number of subtasks
     45 - fan-out is runtime constant
     46 - temporal split into supersteps
     47 - subtasks independent within each step
     48 - barrier sync between steps
     49 - communication only between steps
     50 
     51 How measure parallel performance?
     52 - speedup: $S_{n} = \frac{T_{1}}{T_{n}}$
     53 - T₁: exec time using 1 core
     54 - $T_n$: exec time using n cores
     55 
     56 Parallel program dev process:
     57 1. formulate problem
     58 2. write symbolic math specification
     59 3. write sequential code
     60 4. parallel code
     61 
     62 Parallelisation does not affect asymptotic complexity, only adds a constant factor