lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

lecture-5.md (2515B)


      1 +++
      2 title = 'Lecture 5'
      3 +++
      4 ## Finishing OpenMP
      5 Controlling thread affinity
      6 
      7 OpenMP thread binding:
      8 - bind thread to place (core)
      9 - once bound, thread does not migrate
     10 
     11 OpenMP parallel proc_bind clause:
     12 - `#pragma omp parallel proc_bind(master)`: whole team runs on same place as master
     13 - `#pragma omp parallell proc_bind(close)`: whole team runs on place close to parent (`spread` is the opposite)
     14 
     15 places:
     16 - basic computing resource, usually hardware thread
     17 - describe hierarchic system architecture to OpenMP
     18 - env variable `OMP_PLACES`: "{0,1,2,3},{4,5,6,7},{8:4},{12:4}"
     19     - sets 4 places with 4 execution units
     20 - the actual effect is implementation-dependent
     21 
     22 busy-wait vs suspension:
     23 - problem: threads wait all the time, at synchronisation barriers and critical sections
     24 - busy wait: threads actively poll for some memory address, advantage is threads can quickly proceed when possible, but they waste resources while waiting
     25 - suspension: threads de-schedule to waiting queue, so they don't occupy resources, but they have to wake up to be used
     26 - use variable `OMP_WAIT_POLICY`, `ACTIVE` for mostly busy wait, `PASSIVE` for mostly suspension (but not every compiler might know)
     27 
     28 Atomic operations:
     29 - atomic operations can be mapped to hardware features, but can also be mapped to critical sections
     30 - no guarantee on operational behaviour
     31 - can still be expensive due to data transfer
     32 
     33 So, OpenMP:
     34 - simpler to use that system-facing approaches
     35 - responsible for most organisational aspects
     36 - facilitates experimentation with alternatives
     37 - false directives can lead to wrong or nondeterministic behaviour
     38     - easy to write, hard to find
     39 - insight into impact is reduced
     40 
     41 
     42 ---
     43 
     44 ## POSIX Threads
     45 POSIX threads are low-level, system-facing.
     46 Library-based (`pthread.h`), don't need a compiler that understands special directives.
     47 
     48 Thread creation:
     49 - only one thread at program startup
     50 - all threads created dynamically
     51 - threads may terminate before process
     52 - process terminates when all threads terminate
     53 - any thread can wait for termination of any other thread ("joining")
     54     - terminated threads are zombies until joined
     55 - functions: `pthread_create`, `pthread_join`
     56 - joinable can be joined, detached cannot be joined
     57 
     58 Two classes of threads:
     59 - library threads: OS unaware, whole process blocked when waiting for OS service, implementation can be more efficient
     60  OS/kernel threads: scheduled by OS together with other processes, pre-emptive multithreading, allows overlapping computation and waiting