commit 8221db5b13d2970fab26db27724b9f6920a56f55
parent e3aa4edcb29b950d9f1e44bbdd090b792fac4bcf
Author: Alex Balgavy <alex@balgavy.eu>
Date: Sun, 14 Feb 2021 18:46:55 +0100
Lecture 4
Diffstat:
2 files changed, 41 insertions(+), 0 deletions(-)
diff --git a/content/programming-multi-core-and-many-core-systems/_index.md b/content/programming-multi-core-and-many-core-systems/_index.md
@@ -5,3 +5,4 @@ title = "Programming Multi-Core and Many-Core Systems"
1. [Lecture 1](lecture-1)
2. [Lecture 2](lecture-2)
3. [Lecture 3](lecture-3)
+4. [Lecture 4](lecture-4)
diff --git a/content/programming-multi-core-and-many-core-systems/lecture-4.md b/content/programming-multi-core-and-many-core-systems/lecture-4.md
@@ -0,0 +1,40 @@
++++
+title = 'Lecture 4'
++++
+
+<!-- TODO: finish -->
+problem: parallel execution incurs overhead (creation of worker threads, scheduling, waiting at sync barrier). so overhead must be outweighed by sufficient workload, i.e. loop body and trip count.
+
+conditional parallelisation uses if clause: `#pragma omp parallel for if (len >= 1000)`. so you parallelise only at some threshold.
+
+loop scheduling determines which iterations execute on which thread, aim is to distribute workload equally
+- can use `#pragma omp parallel for schedule(<type> [, <chunk>])`, which selects out of set of scheduling techniques
+- static/block scheduling: loop subdivided into as many chunks as threads with `#pragma omp parallel for schedule(static)`
+- static scheduling with chunk size 1 (cyclic): iterations assigned to threads in round-robin fashion with `#pragma omp parallel for schedule(static, 1)`
+- static scheduling with chunk size n
+- dynamic scheduling: loop divided into chunks of n iterations, chunks dynamically assigned to threads on demand with `#pragma omp parallel for schedule(dynamic, n)`
+ - requires additional synchronisation, more overhead
+ - allows for dynamic load distribution
+
+
+chunk size selection:
+- small means good load balancing, high sync overhead
+- large reduce overhead, but poor load balancing
+
+guided scheduling:
+- at start, large chunks so overhead is small (initial chunk size implementation dependant)
+- when approaching final barrier, small chunks to balance workload (decreases exponentially with every assignment)
+- chunks dynamically assigned to threads on demand
+- `#pragma omp parallel for schedule(guided, <n>)`
+
+runtime scheduling:
+- choose scheduling at runtime
+- `#pragma omp parallel for schedule(runtime)`
+
+
+how do you choose a scheduling technique?
+- depends on code
+- is the amount of computational work per iteration roughly the same for each iteration?
+ - static is preferable if yes
+ - block cyclic scheduling may be useful for regular uneven workload distributions
+ - dynamic preferable for irregular, guided is usually superior