tracking-control-flow.md (3441B)
1 +++ 2 title = 'Tracking control flow' 3 +++ 4 # Tracking control flow 5 Explore execution paths in the program. 6 7 Control flow graph (CFG) of each function: 8 - represent potential flow of control inside function 9 10 Call graph 11 - represent potential flow of control between functions 12 13 Basic block: maximal sequence of instructions that execute one-by-one in order 14 15 Control flow graph: 16 - nodes are basic blocks with one entry point and one exit point 17 - directed edges indicate possible control flow 18 19 Call graph 20 - nodes are functions 21 - directed edges show potential for one function to invoke another 22 23 Identifying a function 24 - ideally: 25 - set a of basic blocks with single entry point 26 - reached using a call instruction 27 - ends with ret instructions 28 - in reality, might be reached using jump, might share blocks with other functions, might have multiple entries 29 30 ## Problems 31 - Pointer-based control transfer: conditionally set a function pointer, then call function pointer 32 - non-returning calls: some functions won't return, code following call site may not be valid 33 - non-contiguous code sections: could have jump tables, data, unparsed code, junk bytes... 34 - tail calls: `return f(x)` 35 - gaps in binary: might contain undiscovered functions 36 - shared code and multiple entry representation 37 38 You should: 39 - run the program multiple times, observe targets of indirect jumps and calls 40 - look for function prologue sequences (`push %rbp`, `mov %rsp, %rbp`) 41 42 ## Loops 43 Loop: subgraph of CFG that is strongly connected (path from any node to any other node), single entry 44 45 Two loops are either disjoint, or one is completely nested in the other. 46 47 Identifying loops: 48 - dominance: node a dominates b if all paths to b go through a (strictly if a and b are different, immediate if it's right before) 49 50 Natural loops: single head node 51 52 Dominator set D(i) = {i} ∪ (intersection for p ∈ Pred(i) of D(p)), where Pred is set of immediate predecessors 53 54 Dominator tree: each node's children are nodes it immediately dominates 55 56 Computing loops: 57 1. compute dominator info, store it in a dominator tree 58 2. identify back edges 59 3. construct natural loops corresponding to back edges 60 61 ## Data-flow analysis 62 Flow sensitivity 63 - flow-sensitive analysis computes for each program point whatever you want to analyze 64 - flow-insensitive pointer analysis computes what memory locations pointers may refer to, at any time in program execution 65 66 Context-sensitive analysis considers calling context when analyzing target of function call 67 68 ### Reaching definitions 69 Which data definitions can reach a point in the program. 70 71 - For each basic block determine: 72 - which definitions the block generates (gen set) 73 - which it kills (kill set) 74 - from this local solution build global one. determine: 75 - which definitions from anywhere can reach the start of a basic block 76 - in[B] = Union for p ∈ pred[B] of out[p] 77 - which definitions can still be alive after the basic block 78 - out[B] = gen[B] ∪ (in[B] - kill[B]) 79 - need to solve iteratively 80 81 Definition-Use chains: definition of variable and all uses, reachable from that definition without other intervening definitions 82 - compute: for each definition D, compute chain of uses that D may reach 83 84 Use-definitions chains: use of variable and all definitions of that variable, that can reach that use without other intervening definitions 85 - compute: for each use U, compute chain of definitions that reach U