lecture-2.md (1886B)
1 +++ 2 title = "Lecture 2" 3 +++ 4 5 Multi vs many core 6 7 CPU levels of parallelism 8 - Multi-core parallelism: task/data parallelism 9 - 4-12 powerful cores, hardware hyperthreading 10 - local caches 11 - symmetrical/asymmetrical threading model 12 - implemented by programmer 13 - Instruction-level: 14 - SIMD 15 16 Cores/threads: 17 - hardware multi-threading (SMT) 18 - core manages thread context 19 - interleaved: temporal multi-threading 20 - simultaneous : co-located execution 21 22 GPU levels of parallelism: 23 - data parallelism 24 - write 1 thread, instantiate a lot of them 25 - SIMT (Single Instruction Multiple Thread) execution 26 - many threads run concurrently: same instruction, different data elements 27 - task parallelism is 'emulated' 28 - hardware mechanisms exist 29 - specific programming constructs to run multiple tasks 30 31 usually connect GPU with host over PCI express 3, theoretical speed 8 GT/s (gigatransactions per second). 32 33 Why different design in CPU vs GPU? 34 - CPU must be good at everything, GPUs focus on massive parallelism 35 - CPU minimize latency experienced by 1 thread 36 - GPU maximize throughput of all threads 37 38 Locality: programs tend to use data and instructions with address near to those used recently 39 40 CPU caches: 41 - small fast SRAM-based memories 42 - hold frequently accessed blocks of main memory 43 44 Hardware performance metrics: 45 - clock frequency (GHz) - absolute hardware speed 46 - operational speed (GFLOPs) - operations per second, single and double precision 47 - memory bandwidth (GB/s) - memory operations per second 48 - power (Watt) - rate of consumption of energy 49 50 Main constraint for optimising compilers: do not cause any change in program behavior. So when in doubt, compiler must be conservative. 51 52 53 In-core parallelism: 54 - ILP: multiple instructions executed at some time, enabled by hardware (which CPU must provision) 55 - SIMD: single instruction on multiple data