lecture-3.md (2317B)
1 +++ 2 title = "Parallelisation & OpenMP" 3 +++ 4 # Parallelisation & OpenMP 5 architecture model from programmer perspective: 6 - share memory 7 - multiple processing units 8 9 program with multithreading. 10 one code, one data heap, multiple program counters, multiple register sets, multiple runtime stacks. 11 threads are provided by OS. 12 13 OpenMP: 14 - compiler directives, library functions, env variables 15 - ideal: automatic parallelisation of seq code 16 - but data dependencies are hard to assess, and compilers must be conservative 17 - so add annotations to sequential program for parallelisation 18 - `#pragma omp name [clause]*` 19 - `#pragma omp parallel { ... }`: parallelize code block 20 - for gcc, add `-fopenmp` 21 - control num threads with env variable `OMP_NUM_THREADS` or lib function `omp_set_num_threads(int)` 22 23 Loop parallelisation: 24 - have each thread compute some disjoint part of the vectors 25 - no data dependence between any two iterations. has to be true. 26 - pragma omp parallel divides the data among threads and synchronizes them 27 - directive must directly precede for-loop, for loop must match constrained pattern, trip-count of for-loop must be known in advance (when you reach the loop, not necessarily at compile time) 28 - private variables: one private instance for each thread, no comms between threads within parallel section or between parallel/sequential sections 29 - shared variables: one shared instance for all threads, comms betwe threads in parallel section and between parallel/sequential sections. concurrent access to this is problematic. 30 - can decide private/shared with clause: `#pragma omp parallel for private(i) shared(c, a, b, len)` 31 - loop-carried dependence: if you compute based on some updated values. 32 33 concurrent access is like a fridge in a shared apartment, your beers can disappear at any time for any reason. 34 35 race condition/data race: if behaviour of program depends on execution order of program parts whose temporal behaviour is beyond control 36 - a critical section is used to restrict thread interleaving, only one thread executes in critical section 37 - `#pragma omp critical { ... } ` 38 - disadvantage: critical sections are synchronized. named critical sections (`#pragma omp critical {...}`) execute synced with other same name critical sections 39 - can use `reduction(+; sum)` clause in pragma instead