lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

lecture-2.md (1886B)


      1 +++
      2 title = "Lecture 2"
      3 +++
      4 
      5 Multi vs many core
      6 
      7 CPU levels of parallelism
      8 - Multi-core parallelism: task/data parallelism
      9     - 4-12 powerful cores, hardware hyperthreading
     10     - local caches
     11     - symmetrical/asymmetrical threading model
     12     - implemented by programmer
     13 - Instruction-level:
     14 - SIMD
     15 
     16 Cores/threads:
     17 - hardware multi-threading (SMT)
     18     - core manages thread context
     19     - interleaved: temporal multi-threading
     20     - simultaneous : co-located execution
     21 
     22 GPU levels of parallelism:
     23 - data parallelism
     24     - write 1 thread, instantiate a lot of them
     25     - SIMT (Single Instruction Multiple Thread) execution
     26     - many threads run concurrently: same instruction, different data elements
     27 - task parallelism is 'emulated'
     28     - hardware mechanisms exist
     29     - specific programming constructs to run multiple tasks
     30 
     31 usually connect GPU with host over PCI express 3, theoretical speed 8 GT/s (gigatransactions per second).
     32 
     33 Why different design in CPU vs GPU?
     34 - CPU must be good at everything, GPUs focus on massive parallelism
     35 - CPU minimize latency experienced by 1 thread
     36 - GPU maximize throughput of all threads
     37 
     38 Locality: programs tend to use data and instructions with address near to those used recently
     39 
     40 CPU caches:
     41 - small fast SRAM-based memories
     42 - hold frequently accessed blocks of main memory
     43 
     44 Hardware performance metrics:
     45 - clock frequency (GHz) - absolute hardware speed
     46 - operational speed (GFLOPs) - operations per second, single and double precision
     47 - memory bandwidth (GB/s) - memory operations per second
     48 - power (Watt) - rate of consumption of energy
     49 
     50 Main constraint for optimising compilers: do not cause any change in program behavior. So when in doubt, compiler must be conservative.
     51 
     52 
     53 In-core parallelism:
     54 - ILP: multiple instructions executed at some time, enabled by hardware (which CPU must provision)
     55 - SIMD: single instruction on multiple data