Lecture updates - lectures.alex.balgavy.eu - Lecture notes from university.

commit e3aa4edcb29b950d9f1e44bbdd090b792fac4bcf
parent fc8e3a9c542b3950bc94d465c0412c36ffc278b3
Author: Alex Balgavy <alex@balgavy.eu>
Date:   Tue,  9 Feb 2021 10:58:39 +0100

Lecture updates

Diffstat:
M content/programming-multi-core-and-many-core-systems/_index.md  | 2 ++
A content/programming-multi-core-and-many-core-systems/lecture-3.md  | 39 +++++++++++++++++++++++++++++++++++++++

2 files changed, 41 insertions(+), 0 deletions(-)
diff --git a/content/programming-multi-core-and-many-core-systems/_index.md b/content/programming-multi-core-and-many-core-systems/_index.md
@@ -3,3 +3,5 @@ title = "Programming Multi-Core and Many-Core Systems"
 +++
 # Table of contents
 1. [Lecture 1](lecture-1)
+2. [Lecture 2](lecture-2)
+3. [Lecture 3](lecture-3)
diff --git a/content/programming-multi-core-and-many-core-systems/lecture-3.md b/content/programming-multi-core-and-many-core-systems/lecture-3.md
@@ -0,0 +1,39 @@
++++
+title = "Parallelisation & OpenMP"
++++
+# Parallelisation & OpenMP
+architecture model from programmer perspective:
+- share memory
+- multiple processing units
+
+program with multithreading.
+one code, one data heap, multiple program counters, multiple register sets, multiple runtime stacks.
+threads are provided by OS.
+
+OpenMP:
+- compiler directives, library functions, env variables
+- ideal: automatic parallelisation of seq code
+- but data dependencies are hard to assess, and compilers must be conservative
+- so add annotations to sequential program for parallelisation
+  - `#pragma omp name [clause]*`
+  - `#pragma omp parallel { ... }`: parallelize code block
+- for gcc, add `-fopenmp`
+- control num threads with env variable `OMP_NUM_THREADS` or lib function `omp_set_num_threads(int)`
+
+Loop parallelisation:
+- have each thread compute some disjoint part of the vectors
+- no data dependence between any two iterations. has to be true.
+- pragma omp parallel divides the data among threads and synchronizes them
+- directive must directly precede for-loop, for loop must match constrained pattern, trip-count of for-loop must be known in advance (when you reach the loop, not necessarily at compile time)
+- private variables: one private instance for each thread, no comms between threads within parallel section or between parallel/sequential sections
+- shared variables: one shared instance for all threads, comms betwe threads in parallel section and between parallel/sequential sections. concurrent access to this is problematic.
+- can decide private/shared with clause: `#pragma omp parallel for private(i) shared(c, a, b, len)`
+- loop-carried dependence: if you compute based on some updated values.
+
+concurrent access is like a fridge in a shared apartment, your beers can disappear at any time for any reason.
+
+race condition/data race: if behaviour of program depends on execution order of program parts whose temporal behaviour is beyond control
+- a critical section is used to restrict thread interleaving, only one thread executes in critical section
+- `#pragma omp critical { ... } `
+- disadvantage: critical sections are synchronized. named critical sections (`#pragma omp critical {...}`) execute synced with other same name critical sections
+- can use `reduction(+; sum)` clause in pragma instead

	lectures.alex.balgavy.eu Lecture notes from university.
	git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
	Log \| Files \| Refs \| Submodules

M	content/programming-multi-core-and-many-core-systems/_index.md	\|	2	++
A	content/programming-multi-core-and-many-core-systems/lecture-3.md	\|	39	+++++++++++++++++++++++++++++++++++++++