lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

commit 8221db5b13d2970fab26db27724b9f6920a56f55
parent e3aa4edcb29b950d9f1e44bbdd090b792fac4bcf
Author: Alex Balgavy <alex@balgavy.eu>
Date:   Sun, 14 Feb 2021 18:46:55 +0100

Lecture 4

Diffstat:
Mcontent/programming-multi-core-and-many-core-systems/_index.md | 1+
Acontent/programming-multi-core-and-many-core-systems/lecture-4.md | 40++++++++++++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/content/programming-multi-core-and-many-core-systems/_index.md b/content/programming-multi-core-and-many-core-systems/_index.md @@ -5,3 +5,4 @@ title = "Programming Multi-Core and Many-Core Systems" 1. [Lecture 1](lecture-1) 2. [Lecture 2](lecture-2) 3. [Lecture 3](lecture-3) +4. [Lecture 4](lecture-4) diff --git a/content/programming-multi-core-and-many-core-systems/lecture-4.md b/content/programming-multi-core-and-many-core-systems/lecture-4.md @@ -0,0 +1,40 @@ ++++ +title = 'Lecture 4' ++++ + +<!-- TODO: finish --> +problem: parallel execution incurs overhead (creation of worker threads, scheduling, waiting at sync barrier). so overhead must be outweighed by sufficient workload, i.e. loop body and trip count. + +conditional parallelisation uses if clause: `#pragma omp parallel for if (len >= 1000)`. so you parallelise only at some threshold. + +loop scheduling determines which iterations execute on which thread, aim is to distribute workload equally +- can use `#pragma omp parallel for schedule(<type> [, <chunk>])`, which selects out of set of scheduling techniques +- static/block scheduling: loop subdivided into as many chunks as threads with `#pragma omp parallel for schedule(static)` +- static scheduling with chunk size 1 (cyclic): iterations assigned to threads in round-robin fashion with `#pragma omp parallel for schedule(static, 1)` +- static scheduling with chunk size n +- dynamic scheduling: loop divided into chunks of n iterations, chunks dynamically assigned to threads on demand with `#pragma omp parallel for schedule(dynamic, n)` + - requires additional synchronisation, more overhead + - allows for dynamic load distribution + + +chunk size selection: +- small means good load balancing, high sync overhead +- large reduce overhead, but poor load balancing + +guided scheduling: +- at start, large chunks so overhead is small (initial chunk size implementation dependant) +- when approaching final barrier, small chunks to balance workload (decreases exponentially with every assignment) +- chunks dynamically assigned to threads on demand +- `#pragma omp parallel for schedule(guided, <n>)` + +runtime scheduling: +- choose scheduling at runtime +- `#pragma omp parallel for schedule(runtime)` + + +how do you choose a scheduling technique? +- depends on code +- is the amount of computational work per iteration roughly the same for each iteration? + - static is preferable if yes + - block cyclic scheduling may be useful for regular uneven workload distributions + - dynamic preferable for irregular, guided is usually superior