lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

lecture-3.md (2317B)


      1 +++
      2 title = "Parallelisation & OpenMP"
      3 +++
      4 # Parallelisation & OpenMP
      5 architecture model from programmer perspective:
      6 - share memory
      7 - multiple processing units
      8 
      9 program with multithreading.
     10 one code, one data heap, multiple program counters, multiple register sets, multiple runtime stacks.
     11 threads are provided by OS.
     12 
     13 OpenMP:
     14 - compiler directives, library functions, env variables
     15 - ideal: automatic parallelisation of seq code
     16 - but data dependencies are hard to assess, and compilers must be conservative
     17 - so add annotations to sequential program for parallelisation
     18   - `#pragma omp name [clause]*`
     19   - `#pragma omp parallel { ... }`: parallelize code block
     20 - for gcc, add `-fopenmp`
     21 - control num threads with env variable `OMP_NUM_THREADS` or lib function `omp_set_num_threads(int)`
     22 
     23 Loop parallelisation:
     24 - have each thread compute some disjoint part of the vectors
     25 - no data dependence between any two iterations. has to be true.
     26 - pragma omp parallel divides the data among threads and synchronizes them
     27 - directive must directly precede for-loop, for loop must match constrained pattern, trip-count of for-loop must be known in advance (when you reach the loop, not necessarily at compile time)
     28 - private variables: one private instance for each thread, no comms between threads within parallel section or between parallel/sequential sections
     29 - shared variables: one shared instance for all threads, comms betwe threads in parallel section and between parallel/sequential sections. concurrent access to this is problematic.
     30 - can decide private/shared with clause: `#pragma omp parallel for private(i) shared(c, a, b, len)`
     31 - loop-carried dependence: if you compute based on some updated values.
     32 
     33 concurrent access is like a fridge in a shared apartment, your beers can disappear at any time for any reason.
     34 
     35 race condition/data race: if behaviour of program depends on execution order of program parts whose temporal behaviour is beyond control
     36 - a critical section is used to restrict thread interleaving, only one thread executes in critical section
     37 - `#pragma omp critical { ... } `
     38 - disadvantage: critical sections are synchronized. named critical sections (`#pragma omp critical {...}`) execute synced with other same name critical sections
     39 - can use `reduction(+; sum)` clause in pragma instead