Lecture 4 - lectures.alex.balgavy.eu - Lecture notes from university.

commit 8221db5b13d2970fab26db27724b9f6920a56f55
parent e3aa4edcb29b950d9f1e44bbdd090b792fac4bcf
Author: Alex Balgavy <alex@balgavy.eu>
Date:   Sun, 14 Feb 2021 18:46:55 +0100

Lecture 4

Diffstat:
M content/programming-multi-core-and-many-core-systems/_index.md  | 1 +
A content/programming-multi-core-and-many-core-systems/lecture-4.md  | 40 ++++++++++++++++++++++++++++++++++++++++

2 files changed, 41 insertions(+), 0 deletions(-)
diff --git a/content/programming-multi-core-and-many-core-systems/_index.md b/content/programming-multi-core-and-many-core-systems/_index.md
@@ -5,3 +5,4 @@ title = "Programming Multi-Core and Many-Core Systems"
 1. [Lecture 1](lecture-1)
 2. [Lecture 2](lecture-2)
 3. [Lecture 3](lecture-3)
+4. [Lecture 4](lecture-4)
diff --git a/content/programming-multi-core-and-many-core-systems/lecture-4.md b/content/programming-multi-core-and-many-core-systems/lecture-4.md
@@ -0,0 +1,40 @@
++++
+title = 'Lecture 4'
++++
+
+<!-- TODO: finish -->
+problem: parallel execution incurs overhead (creation of worker threads, scheduling, waiting at sync barrier). so overhead must be outweighed by sufficient workload, i.e. loop body and trip count.
+
+conditional parallelisation uses if clause: `#pragma omp parallel for if (len >= 1000)`. so you parallelise only at some threshold.
+
+loop scheduling determines which iterations execute on which thread, aim is to distribute workload equally
+- can use `#pragma omp parallel for schedule(<type> [, <chunk>])`, which selects out of set of scheduling techniques
+- static/block scheduling: loop subdivided into as many chunks as threads with `#pragma omp parallel for schedule(static)`
+- static scheduling with chunk size 1 (cyclic): iterations assigned to threads in round-robin fashion with `#pragma omp parallel for schedule(static, 1)`
+- static scheduling with chunk size n
+- dynamic scheduling: loop divided into chunks of n iterations, chunks dynamically assigned to threads on demand with `#pragma omp parallel for schedule(dynamic, n)`
+    - requires additional synchronisation, more overhead
+    - allows for dynamic load distribution
+
+
+chunk size selection:
+- small means good load balancing, high sync overhead
+- large reduce overhead, but poor load balancing
+
+guided scheduling:
+- at start, large chunks so overhead is small (initial chunk size implementation dependant)
+- when approaching final barrier, small chunks to balance workload (decreases exponentially with every assignment)
+- chunks dynamically assigned to threads on demand
+- `#pragma omp parallel for schedule(guided, <n>)`
+
+runtime scheduling:
+- choose scheduling at runtime
+- `#pragma omp parallel for schedule(runtime)`
+
+
+how do you choose a scheduling technique?
+- depends on code
+- is the amount of computational work per iteration roughly the same for each iteration?
+    - static is preferable if yes
+        - block cyclic scheduling may be useful for regular uneven workload distributions
+    - dynamic preferable for irregular, guided is usually superior

	lectures.alex.balgavy.eu Lecture notes from university.
	git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
	Log \| Files \| Refs \| Submodules

M	content/programming-multi-core-and-many-core-systems/_index.md	\|	1	+
A	content/programming-multi-core-and-many-core-systems/lecture-4.md	\|	40	++++++++++++++++++++++++++++++++++++++++