commit e7aae3e65086a9736ffba47516ebd3d4dcbb2db2
parent 93d1647226c45729fb98ea81ffbb1f1b5d7be614
Author: Alex Balgavy <alex@balgavy.eu>
Date: Mon, 1 Feb 2021 16:53:03 +0100
Lectures notes for multicore programming
Diffstat:
3 files changed, 68 insertions(+), 0 deletions(-)
diff --git a/content/_index.md b/content/_index.md
@@ -9,6 +9,7 @@ title = "Alex's university course notes"
* [Computer & Network Security](computer-network-security/)
* [Logical Verification](logical-verification/)
* [Software Architecture](software-architecture/)
+* [Programming Multi-Core and Many-Core Systems](programming-multi-core-and-many-core-systems)
# BSc Computer Science (VU Amsterdam)
---
diff --git a/content/programming-multi-core-and-many-core-systems/_index.md b/content/programming-multi-core-and-many-core-systems/_index.md
@@ -0,0 +1,5 @@
++++
+title = "Programming Multi-Core and Many-Core Systems"
++++
+# Table of contents
+1. [Lecture 1](lecture-1)
diff --git a/content/programming-multi-core-and-many-core-systems/lecture-1.md b/content/programming-multi-core-and-many-core-systems/lecture-1.md
@@ -0,0 +1,62 @@
++++
+title = 'Lecture 1'
+template = 'page-math.html'
++++
+
+Programming shared address space concurrent systems:
+- App-facing high-level approach: almost ordinary sequential program, add compiler directives to parallelise
+ - structured parallelism, limited choice
+- System-facing: explicitly multithreaded
+- SIMD parallelism and vectorisation: single instruction, multiple data
+ - extra long (vector) registers
+ - store multiple values at once, apply same instruction to all values in vector
+ - can be automatically vectorised by compiler, or use compiler intrinsics manually
+
+Concurrency: tasks A and B are concurrent iff they can be performed independently of each other, yielding identical results.
+
+Parallelisation patterns:
+- decomposition:
+ 1. partition problem into concurrent subproblems
+ 2. Solve each sub-problem independently
+ 3. Combine partial solutions into solution of initial problem
+- functional decomposition:
+ 1. Partition problem into mostly independent functional units
+ 2. Each functional unit has own implementation
+ 3. functional units communicate with each other
+- pipelining
+ - sequence of independent tasks
+ - task receives input data and produces output (fifo buffers)
+ - tasks are connected via streams (fifo buffers)
+- domain (data structure) decomposition
+ - problem dominated by manipulation of large data structure
+ - decompose the data structure
+ - let each process manipulate its own part of data
+ - recombine data structure
+- divide and conquer
+ - task solved sequentially or recursively split into subtasks
+ - fan-out defined by algorithm
+ - subtasks computed independently
+ - task waits for subtasks to finish
+ - task combines partial result
+ - only parent-child communication
+
+Organisation principle: parallel supersteps
+- spatial split into number of subtasks
+- fan-out is runtime constant
+- temporal split into supersteps
+- subtasks independent within each step
+- barrier sync between steps
+- communication only between steps
+
+How measure parallel performance?
+- speedup: $S_{n} = \frac{T_{1}}{T_{n}}$
+- T₁: exec time using 1 core
+- $𝑇_{𝑛}$: exec time using n cores
+
+Parallel program dev process:
+1. formulate problem
+2. write symbolic math specification
+3. write sequential code
+4. parallel code
+
+Parallelisation does not affect asymptotic complexity, only adds a constant factor