index.md (2216B)
1 +++ 2 title = 'Branch delays' 3 +++ 4 # Branch delays 5 ## Unconditional branches 6 7 the two instructions that are fetched during decode and compute of first instruction (a branch) have to be discarded 8 9 this two cycle-delay — “branch penalty” 10 11 ![screenshot.png](screenshot-40.png) 12 13 to reduce the penalty, branch target address must be computed earlier than pipeline — in the decode stage 14 15 this reduces the penalty to one cycle: 16 17 ![screenshot.png](screenshot-41.png) 18 19 this needs hardware modification — PC has to be incremented in every cycle, and a second adder is needed in decode stage to compute branch target address for every instruction 20 21 ## Conditional branches 22 branch condition must be tested as early as possible 23 comparator to test condition can be moved to decode stage 24 it would use values from register file outputs A and B directly 25 26 ## Branch delay slot — compiler reorganises instructions 27 branch delays slot — the location that follows a branch instruction 28 29 compiler tries to find an instruction that it always executed, independent of whether or not the program branches 30 31 data dependencies must be preserved 32 if the compiler can find a useful instruction, there’s no branch penalty 33 otherwise, it NOPs out and there’s a penalty of one cycle 34 35 ![screenshot.png](screenshot-42.png) 36 37 ## Branch prediction 38 Static: 39 40 - assume branch will not be taken, fetch next instruction in sequential order 41 - simple, decent accuracy 42 - processor can determine static prediction by checking sign of branch offset 43 - other option — encoding of branch instruction can include a bit indicating whether prediction should be ‘take’ or ‘not taken’ (set by compiler) 44 45 Dynamic: 46 47 - use actual branch behaviour 48 - processor assumes that next time an instruction is executed, branch decision is likely to be the same as last time 49 - keep more information about execution history, such as four states (strongly taken, likely taken, likely not taken, strongly not taken) 50 - keep a branch target buffer 51 - identifies branch instructions by their addresses 52 - in the form of lookup table 53 - contains: address of branch instruction, one/two state bits for branch prediction algorithm (outcome), branch target address